巴西专利BR112015004288B1 system for rendering sound using reflected sound elements

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
REFLECTED SOUND RENDERING FOR OBJECT-BASED AUDIO. These are modalities that are described to render spatial audio content through a system that is configured to reflect audio from one or more surfaces in a listening environment. The system includes an array of audio drivers distributed around a room, at least one driver of the array of drivers being configured to project sound waves towards one or more surfaces of the listening environment for reflection to an area of listen within the listening environment and a renderer configured to receive and process audio streams and one or more sets of metadata that are associated with each of the audio streams and that specify a playback location in the listening environment.
公开号:BR112015004288B1
申请号:R112015004288-0
申请日:2013-08-28
公开日:2021-05-04
发明作者:Brett G. Crockett；Spencer Hooks；Alan Seefeldt；Joshua B. Lando；C. Phillip Brown；Sripal S. Mehta；Stewart Murrie
申请人:Dolby Laboratories Licensing Corporation；
IPC主号:

专利说明:

CROSS REFERENCE TO RELATED ORDERS
[001] This application claims the benefit of priority over U.S. Provisional Patent Application 61/695,893 filed August 31, 2012, incorporated herein by reference in its entirety. FIELD OF THE INVENTION
[002] One or more implementations generally refer to audio signal processing, and more specifically to rendering adaptive audio content through direct and reflected drivers in certain listening environments. BACKGROUND OF THE INVENTION
[003] The matter discussed in the background section should not be assumed to be head-on technique merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the matter of the background section should not be assumed to have been previously recognized in the frontal technique. The material in the background section merely represents different approaches, which themselves can also be inventions.
[004] Movie soundtracks typically comprise many different sound elements that correspond to on-screen images, dialogue, noise, and sound effects that emanate from different places on the screen and combine with background music and ambient effects to create the experience audience as a whole. Accurate reproduction requires that sounds be reproduced in a mode that matches as closely as possible what is shown on the screen with respect to the position, intensity, movement, and depth of the sound source. Traditional channel-based audio systems send audio content in the form of speaker feeds to individual speakers in a playback environment. The introduction of digital cinema has created new standards for cinema sound, such as incorporating multiple audio channels to enable greater creativity for content creators, and a more immersive and realistic listening experience for audiences. Expanding beyond traditional speaker feeds and channel-based audio as a means to distribute spatial audio is critical, and there has been considerable interest in a model-based audio description that allows the listener to select a playback configuration with the audio rendered specifically for your chosen configuration. To further enhance the listening experience, sound reproduction in true three-dimensional (3D) or virtual 3D environments has become an area of growing research and development. The spatial presentation of sound uses audio objects, which are audio signals with associated parametric source descriptions of apparent source position (eg, 3D coordinates), apparent source width, and other parameters. Object-based audio can be used for many multimedia applications such as digital movies, video games, simulators, and is of particular importance in a home environment where the number of speakers and their placement are often limited or constrained by the confines of a relatively small listening environment.
[005] Various technologies have been developed to improve sound systems in cinema environments and to more accurately capture and reproduce the creator's artistic intent for a motion picture soundtrack. For example, a next-generation spatial audio format (also referred to as "adaptive audio") has been developed that comprises a mix of audio objects and traditional channel-based speaker feeds along with positional metadata for the audio objects. In a spatial audio decoder, channels are either sent directly to their associated speakers (if the proper speakers exist) or down-mixed to an existing set of speakers, and audio objects are rendered by the decoder. flexible way. The parametric source description associated with each object, such as a positional trajectory in 3D space, is taken as an input along with the number and position of speakers connected to the decoder. The renderer then uses certain algorithms, such as a law of panning, to distribute the audio associated with each object through the attached set of speakers. In this way, the spatial intention authored by each object is ideally presented over the specific loudspeaker configuration that is present in the listening environment.
[006] Current spatial audio systems have generally been developed for use in cinema and thus involve employment in large rooms and the use of relatively expensive equipment, including multiple speaker arrangements distributed around the listening environment. An increasing amount of movie content that is currently being produced is available for playback in the home environment through streaming technology and advanced media technology such as blu-ray, and so on. In addition, emerging technologies such as 3D television and advanced computer games and simulators have encouraged the use of relatively sophisticated equipment such as large-screen monitors, speaker arrangements and ambient sound receivers in homes and other listening environments (not cinema/theater). However, equipment cost, installation complexity, and room size are realistic constraints that preclude full exploration of spatial audio in most home environments. For example, advanced object-based audio systems typically employ tall or suspended speakers to reproduce sound that is intended to originate above a listener's head. In many cases, and especially in the home environment, such tall speakers may not be available. In this case, height information is lost if such sound objects are played only through wall-mounted or floor-mounted speakers.
[007] What is needed, therefore, is a system that allows complete spatial information from an adaptive audio system to be reproduced in a listening environment that may include only a portion of the complete speaker arrangement intended for playback, such as limited or non-suspended speakers, and that can use reflected speakers to emanate sound from places where direct speakers may not exist. BRIEF SUMMARY OF MODALITIES
[008] Systems and methods are described for an audio format and system that includes updated content creation tools, distribution methods, and a sharp user experience based on an adaptive audio system that includes new speaker settings and channel, as well as a new spatial description format made possible by a set of advanced content creation tools created for cinema sound mixers. Modalities include a system that expands the cinema-based adaptive audio concept to a particular audio playback ecosystem, including home theater (eg, A/V receiver, soundbar, and blu-ray player), AND -media (eg, PC, tablet, mobile device, and headset playback), broadcast (eg, TV and set-top box), music, games, live sound, user-generated content ("UGC") , and so on. The home environment system includes components that provide compatibility with theatrical content, and feature metadata definitions that include content creation information to convey creative intent, media intelligence information regarding audio objects, speaker feeds , spatial rendering information, and content-dependent metadata that indicate content type, such as dialog, music, ambience, and so on. Adaptive audio definitions can include standard speaker feeds through audio channels, plus audio objects with associated spatial rendering information (such as size, velocity, and location in three-dimensional space). An innovative speaker setup (or channel setup) and an accompanying new spatial description format that will support multiple rendering technologies are also described. Audio streams (usually including channels and objects) are transmitted along with metadata that describes the intent of the sound mixer or content creator, including the desired position of the audio stream. Position can be expressed as a named channel (from within the predefined channel configuration) or as 3D spatial position information. This channel-plus-object format provides the best of both model-based and channel-based audio scene description methods.
[009] Modalities are specifically aimed at a system to render sound using reflected sound elements that comprise an array of audio drivers for distribution around a listening environment, some of the drivers being direct drivers and others are reflected drivers that are configured to project sound waves towards one or more surfaces of the listening environment for reflection to a specific listening area; a renderer for processing audio streams and one or more sets of metadata that are associated with each audio stream and that specify a playback location in the listening environment of a respective audio stream, being the audio streams comprise one or more reflected audio streams and one or more direct audio streams; and a playback system for rendering the audio streams to the array of audio drivers conforming to the one or more sets of metadata, and whereby the one or more reflected audio streams are transmitted to the reflected audio drivers. MERGER AS REFERENCE
[0010] Any publication, patent, and/or patent application mentioned in this specification is incorporated herein by reference in its entirety as if each individual publication and/or patent application were specifically and individually indicated as being incorporated by way of reference. BRIEF DESCRIPTION OF THE DRAWINGS
[0011] In the following drawings, like reference numbers are used to refer to like elements. Although the following Figures depict various examples, the one or more deployments are not limited to the examples depicted in the Figures.
[0012] Figure 1 illustrates an exemplary speaker placement in an ambient system (eg 9.1 surround) that provides tall speakers for loud channel reproduction.
[0013] Figure 2 illustrates the combination of object- and channel-based data to produce an adaptive audio mix, under one modality.
[0014] Figure 3 is a block diagram of a reproduction architecture for use in an adaptive audio system, under an embodiment.
[0015] Figure 4A is a block diagram that illustrates the functional components for adapting film-based audio content for use in a listening environment under a modality.
[0016] Figure 4B is a detailed block diagram of the components of Figure 3A, under an embodiment.
[0017] Figure 4C is a block diagram of the functional components of an adaptive audio environment, under a modality.
[0018] Figure 5 illustrates the use of an adaptive audio system in an exemplary home theater environment.
[0019] Figure 6 illustrates the use of an up trigger driver with the use of reflected sound to simulate a suspended speaker in a listening environment.
[0020] Figure 7A illustrates a speaker that has a plurality of drivers in a first configuration for use in an adaptive audio system that has a reflected sound renderer, under one modality.
[0021] Figure 7B illustrates a speaker system that has drivers distributed in multiple enclosures for use in an adaptive audio system that has a reflected sound renderer, under one modality.
[0022] Figure 7C illustrates an exemplary configuration for a soundbar used in an adaptive audio system with the use of a reflected sound renderer, under a modality.
[0023] Figure 8 illustrates an exemplary placement of speakers that have individually addressable drivers, including trigger-up drivers placed within a listening environment.
[0024] Figure 9A illustrates a speaker configuration for an adaptive 5.1 audio system that uses multiple addressable drivers for reflected audio, under one modality.
[0025] Figure 9B illustrates a speaker configuration for a 7.1 adaptive audio system that uses multiple addressable drivers for reflected audio, under one modality.
[0026] Figure 10 is a diagram illustrating the composition of a bidirectional interconnection, under a modality.
[0027] Figure 11 illustrates an automatic configuration and system calibration process for use in an adaptive audio system, under one modality.
[0028] Figure 12 is a flowchart illustrating process steps for a calibration method used in an adaptive audio system, under one modality.
[0029] Figure 13 illustrates the use of an adaptive audio system in an exemplary soundbar and television use case.
[0030] Figure 14 illustrates a simplified representation of a three-dimensional binaural headphone virtualization in an adaptive audio system, under one modality.
[0031] Figure 15 is a table that illustrates certain metadata definitions for use in an adaptive audio system that uses a reflected sound renderer for listening environments, under a modality.
[0032] Figure 16 is a graph illustrating the frequency response for a combined filter, under one modality. DETAILED DESCRIPTION OF THE INVENTION
[0033] Systems and methods are described for an adaptive audio system that renders reflected sound for adaptive audio systems that lack overhead speakers. Aspects of one or more modalities described in this document may be deployed in an audio or audiovisual system that processes source audio information in a playback, rendering, and mixing system that includes one or more computers or processing devices that perform software instructions. Any of the described modalities can be used alone or together with each other in any combination. Although several modalities may have been motivated by various deficiencies with the frontal technique, which may be discussed or alluded to in one or more places in the descriptive report, modalities do not necessarily address any of these deficiencies. In other words, different modalities can address different deficiencies that can be discussed in the descriptive report. Some modalities may only partially address some deficiencies or only a deficiency that can be discussed in the descriptive report, and some modalities may not address any of these deficiencies.
[0034] For the purposes of this description, the following terms have the associated meanings: the term "channel" means an audio signal plus metadata in which the position is encoded as a channel identifier, eg left-front surroundings or right-top; "channel-based audio" is audio formatted for playback through a predefined set of speaker zones with associated nominal locations, eg 5.1, 7.1, and so on; the term "object" or "object-based audio" means one or more audio channels with a parametric source description, such as apparent source position (eg 3D coordinates), apparent source width, etc.; and "adaptive audio" means channel-based and/or object-based audio signals plus metadata that render the audio signals based on the playback environment using an audio stream plus metadata in which position is encoded as a 3D position in space; and "listening environment" means any open, partially enclosed, or completely enclosed area, such as a room that can be used for playback of audio content alone or with video or other content, and can be incorporated into a home, cinema, theater , auditorium, studio, game console, and the like. Such an area may have one or more surfaces disposed in it, such as walls or baffles that may directly or diffusely reflect sound waves. ADAPTIVE AUDIO SYSTEM AND FORMAT
[0035] Modalities are directed to a system rendering of reflected sound that is configured to work with a sound format and processing system that may be referred to as a "spatial audio system" or "adaptive audio system" which is based on an audio format and rendering technology to enable deep audience immersion, greater artistic control, and system scalability and flexibility. An adaptive audio system as a whole generally comprises an audio encoding, distributing and decoding system configured to generate one or more bitstreams containing conventional channel-based audio elements and audio object encoding elements. Such a combined approach provides greater coding efficiency and rendering flexibility compared to channel-based or object-based approaches taken separately. An example of an adaptive audio system that can be used in conjunction with present embodiments is described in Pending Provisional Patent Application No. US 61/636,429, filed April 20, 2012 and entitled "System and Method for Adaptive Audio Signal Generation, Coding and Rendering", which is incorporated herein by reference in its entirety.
[0036] An exemplary deployment of an adaptive audio system and associated audio format is the Dolby® Atmos™ platform. Such a system incorporates a height dimension (up/down) that can be deployed as a 9.1 ambient system, or similar ambient sound configuration. Figure 1 illustrates speaker placement in a present ambient system (eg 9.1 surround) that provides tall speakers for playing tall channels. The speaker configuration of the 9.1 100 system consists of five speakers 102 on the ground plane and four speakers 104 on the height plane. In general, these speakers can be used to produce sound that is designed to emanate from any position more or less accurately within the listening environment. Preset speaker settings, such as those shown in Figure 1, can naturally limit the ability to accurately represent the position of a given sound source. For example, a sound source cannot be panned further to the left than the left speaker itself. This applies to every loudspeaker, thus forming a geometric shape one-dimensional (eg left-right), two-dimensional (eg front-rear), or three-dimensional (eg left-right, front -back, up-down), in which the downmix is restricted. Several different speaker configurations and types can be used in such speaker configuration. For example, certain accented audio systems may use speakers in a configuration of 9.1, 11.1, 13.1, 19.4, or others. Speaker types can include full-range direct speakers, speaker arrangements, room speakers, subwoofers, tweeters, and other types of speakers.
[0037] Audio objects can be considered as groups of sound elements that can be perceived by emanating from a particular physical location or locations in the listening environment. Such objects can be static (ie stationary) or dynamic (ie moving). Audio objects are controlled by metadata that defines the position of the sound at a given point in time, along with other functions. When objects are played back, they are rendered according to the positional metadata using the speakers that are present, rather than necessarily being output to a predefined physical channel. A track in a session can be an audio object, and standard panning data is analogous to positional metadata. In this way, content placed on the screen can effectively pan in the same way as channel-based content, but content placed on the surrounds can be rendered to an individual speaker if desired. While the use of audio objects provides the desired control for distinct effects, other aspects of a soundtrack can work effectively in a channel-based environment. For example, many ambient or reverb effects actually benefit from being fed into speaker arrangements. While these can be treated as objects wide enough to fill an array, it is beneficial to retain some channel-based functionality.
[0038] The adaptive audio system is configured to support "headquarters" in addition to audio objects, where seats are effectively channel-based trunks or downmixes. These can be delivered for final reproduction (rendering) individually, or combined in a single location, depending on the intent of the content creator. These seats can be created in different channel-based configurations such as 5.1, 7.1, and 9.1, and arrangements that include pendant speakers, as shown in Figure 1. Figure 2 illustrates the combination of object-based and channel-based data for produce an adaptive audio mix, under a modality. As shown in process 200, channel-based data 202, which, for example, may be 5.1 or 7.1 surround data provided in the form of pulse code modulated (PCM) data, is combined with object data. audio 204 to produce an adaptive audio mix 208. The audio object data 204 is produced by combining the original channel-based data elements with associated metadata that specify certain parameters pertaining to the location of the audio objects. As shown conceptually in Figure 2, authoring tools provide the ability to create audio programs that contain a combination of speaker channel groups and object channels simultaneously. For example, an audio program could contain one or more speaker channels optionally arranged in groups (or tracks, eg a stereo or 5.1 track), descriptive metadata for one or more speaker channels, one or more object channels, and descriptive metadata for one or more object channels.
[0039] An adaptive audio system effectively moves beyond simple "speaker feeds" as a means to distribute spatial audio, and advanced model-based audio descriptions have been developed that allow the listener the freedom to select a playback setup that suits your individual needs or budget and have the audio rendered specifically for your individually chosen setup. At a high level, there are four main spatial audio description formats: (1) speaker power, where audio is described as signals intended for speakers located at nominal speaker positions; (2) microphone feed, where audio is described as signals captured by actual or virtual microphones in a predefined configuration (the number of microphones and their relative position); (3) model-based description, in which the audio is described in terms of a sequence of audio events at described times and positions; and (4) binaural, in which the audio is described by the signals that reach both ears of a listener.
[0040] The four description formats are often associated with the following common rendering technologies, where the term "render" means conversion to electrical signals used as speaker feeds: (1) panning, in which the flow of audio is converted to speaker feeds using a set of panning laws and known or assumed speaker positions (typically rendered before delivery); (2) Ambissonic, where microphone signals are converted into feeds for a scalable array of speakers (typically rendered after distribution); (3) Wave Field Synthesis (WFS), in which sound events are converted to speaker signals suitable for synthesizing a sound field (typically rendered after distribution); and (4) binaural, in which binaural E/D signals are delivered to the E/D ear, typically through headphones, but also through speakers in conjunction with crosstalk cancellation.
[0041] In general, any format can be converted to another format (although this may require blind source separation or similar technology) and rendered using any of the aforementioned technologies; however, not all transformations yield good results in practice. The speaker power format is the most common as it is simple and effective. The best sonic results (ie the most accurate and reliable) are achieved by mixing/monitoring and then distributing the speaker feeds directly as there is no processing required between the content creator and the listener. If the reproduction system is known in advance, a speaker power description provides the greatest fidelity; however, the playback system and its configuration are often not known in advance. In contrast, the model-based description is the most adaptable as it makes no assumptions about the reproduction system and is therefore more easily applied to multiple rendering technologies. Model-based description can effectively capture spatial information, but it becomes very ineffective as the number of audio sources increases.
[0042] The adaptive audio system combines the benefits of both channel- and model-based systems, with specific benefits including high timbre quality, optimal reproduction of artistic intent when mixing, and rendering using the same channel setting , unique inventory with "downward" adaptation to the rendering configuration, relatively low impact on system piping, and greater immersion through thinner horizontal speaker spatial resolution and new height channels. The adaptive audio system provides several new features, including: a single inventory with downward and upward adaptation to a specific cinema rendering setting, ie rendering delay and optimal use of available speakers in an environment of reproduction; greater involvement, including enhanced downmix to avoid interchannel correlation (ICC) artifacts; increased spatial resolution through cross-conduction arrangements (eg, allowing an audio object to be dynamically assigned to one or more speakers within an ambient arrangement); and higher front channel resolution through a high resolution center or similar speaker configuration.
[0043] The spatial effects of audio signals are critical in providing an immersive experience for the listener. Sounds that are intended to emanate from a specific region of a viewing screen or listening environment must be played through speaker(s) located in the same relative location. Thus, the primary audio data of a sound event in a model-based description is position, although other parameters such as size, orientation, velocity and acoustic dispersion can also be described. To transport position, a model-based 3D audio spatial description requires a 3D coordinate system. The coordinate system used for transmission (Euclidean, Spherical, Cylindrical) is generally chosen for convenience or compactness; however, other coordinate systems can be used for rendering processing. In addition to a coordinate system, a frame of reference is required to represent the locations of objects in space. For systems to accurately reproduce sound based on position in a variety of different environments, selecting the proper frame of reference can be critical. With an allocentric frame of reference, an audio source position is defined relative to features within the rendering environment, such as room walls and corners, default speaker locations, and screen location. In an egocentric frame of reference, locations are represented in relation to the listener's perspective, such as "in front of me", "a little to the left", and so on. Scientific studies of spatial perception (audio and otherwise) have shown that the egocentric perspective is used almost universally. For cinema, however, the allocentric frame of reference is generally more adequate. For example, the precise location of an audio object is most important when there is an associated object on the screen. When using an allocentric reference, for each listening position and for any screen size, the sound will be located in the same relative position on the screen, eg "one third to the left of the middle of the screen." Another reason is that mixers tend to think and mix in allocentric terms, and panning tools are exposed with an allocentric frame (ie, room walls), and mixers expect them to be rendered that way, for example, "this sound must be on the screen", "this sound must be off the screen", or "the left wall", and so on.
[0044] Despite the use of the allocentric frame of reference in the film setting, there are some cases where an egocentric frame of reference can be useful and more suitable. These include non-diegetic sounds, that is, those that are not present in "story space", eg background music, for which an egocentrically uniform presentation may be desirable. Another case is that of near-field effects (for example, a mosquito buzzing in the listener's left ear) that require an egocentric representation. Furthermore, infinitely distant sound sources (and the resulting plane waves) may appear to come from a constant egocentric position (eg, 30 degrees to the left), and such sounds are easier to describe in egocentric terms than in terms allocentric. In some cases, it is possible to use an allocentric frame of reference, as long as a nominal listening position is defined, while some examples require an egocentric representation that is not yet renderable. While an allocentric reference may be more useful and suitable, the audio representation must be extensible, as many new features, including egocentric representation, may be more desirable in certain applications and listening environments.
[0045] Adaptive audio system modalities include a hybrid spatial description approach that includes a recommended channel setting for optimal fidelity and for rendering complex or diffuse multi-point sources (eg, stadium crowd, ambience) with the use of an egocentric reference plus a model-based, allocentric sound description to effectively allow for greater spatial resolution and scalability. Figure 3 is a block diagram of a playback architecture for use in an adaptive audio system, under one modality. The system in Figure 3 includes processing blocks that perform legacy, object and channel audio decoding, object rendering, channel remapping, and signal processing before the audio is sent for post-processing and/or amplification and speaker stages.
[0046] The reproduction system 300 is configured to render and reproduce audio content that is generated through one or more components of capture, pre-processing, authoring and encoding. An adaptive audio preprocessor can include source separation and content type detection functionality that automatically generates proper metadata through input audio analysis. For example, positional metadata can be derived from a multi-channel recording through an analysis of the relative levels of correlated input between channel pairs. Content type detection such as "speech" or "music" can be achieved, for example, by classification and resource extraction. Certain authoring tools allow authoring of audio programs by improving the input and encoding of the sound engineer's creative intent, allowing the sound engineer to create the final audio mix once it is enhanced for playback in virtually any playback environment . This can be accomplished through the use of audio objects and positional data that are associated and encoded with the original audio content. In order to accurately place sounds around an auditorium, the sound engineer needs to control how the sound will ultimately be rendered based on the actual constraints and resources of the playback environment. The adaptive audio system provides this control by allowing the sound engineer to change how audio content is projected and mixed through the use of audio objects and positional data. Once the adaptive audio content has been authored and encoded in the proper codec devices, it is decoded and rendered on the various components of playback system 300.
[0047] As shown in Figure 3, (1) legacy 302 surround sound audio, (2) object audio including 304 object metadata, and (3) channel audio including 306 channel metadata are inserted into states of decoder 308, 309 within processing block 310. Object metadata is rendered in object renderer 312, while channel metadata can be remapped as needed. 307 listening environment configuration information is provided to the object renderer and channel remapping component. The hybrid audio data is then processed through one or more signal processing stages such as equalizers and limiters 314 before being output to the B-chain processing stage 316 and playback through speakers 318. System 300 represents a example of a reproduction system for adaptive audio, and other configurations, components, and interconnections are also possible.
[0048] The system of Figure 3 illustrates an embodiment in which the renderer comprises a component that applies object metadata to input audio channels to process object-based audio content together with optional channel-based audio content. Modalities can also be addressed to a case where the input audio channels comprise content on a legacy channel basis only, and the renderer comprises a component that generates speaker feeds for transmission to an array of drivers in a configuration of ambient sound. In this case, the input is not necessarily object-based content, but legacy 5.1 or 7.1 (or other non-object-based) content, as provided in Dolby Digital or Dolby Digital Plus, or similar systems. REPRODUCTION APPLICATIONS
[0049] As mentioned above, an initial deployment of the adaptive audio system and format is in the context of digital cinema (cinema D) which includes content capture (objects and channels) that are authored using innovative authoring tools , packaged using an adaptive audio cinema encoder, and distributed using either PCM or a proprietary lossless codec using the existing Digital Cinema Initiative (DCI) distribution mechanism. In this case, the audio content is intended to be decoded and rendered in a digital cinema to create an immersive spatial audio cinema experience. However, as with front-end cinema enhancements such as analog surround sound, digital multi-channel audio, etc., there is an imperative to deliver the enhanced user experience provided by format adaptive audio directly to users in their homes. This requires that certain format and system characteristics be adapted for use in more limited listening environments. For example, homes, rooms, small auditoriums, or similar places may have reduced space, acoustic properties, and equipment capabilities compared to a cinema or theater environment. For purposes of description, the term "consumer-based environment" is intended to include any non-cinema environment that comprises a listening environment for use by professionals or regular consumers, such as a home, studio, room, console area, auditorium , and the like. Audio content can be sourced and rendered alone, or it can be associated with graphical content, for example, still images, lighted displays, video, and so on.
[0050] Figure 4A is a block diagram illustrating the functional components for adapting cinema-based audio content for use in a listening environment under a modality. As shown in Figure 4A, cinema content that typically comprises a movie soundtrack is captured and/or authored using appropriate equipment and tools in block 402. In an adaptive audio system, this content is processed through interfaces and rendering and encoding/decoding components in block 404. The resulting object and channel audio feeds are then sent to the appropriate speakers in the cinema or theater, 406. In system 400, the cinema content is also processed for playback in a listening environment, such as a home theater system, 416. It is assumed that the listening environment is not as responsive or capable of playing all sound content as intended by the content creator due to limited space, high count -speaker reduced, and so on. However, modalities are aimed at systems and methods that allow the original audio content to be rendered in a way that minimizes the constraints imposed by the reduced capacity of the listening environment, and allow positional cues to be processed in a way that maximizes the available equipment . As shown in Figure 4A, cinema audio content is processed through cinema to consumer translator component 408 where it is processed in the consumer content rendering and encoding chain 414. This chain also processes original audio content that is captured and/or authored in block 412. The original content and/or translated movie content is then played back in the listening environment, 416. In this way, the relevant spatial information that is encoded in the audio content can be used to render the sound more immersive, even using the possibly limited speaker configuration of the home or listening environment 416.
[0051] Figure 4B illustrates the components of Figure 4A in greater detail. Figure 4B illustrates an exemplary distribution mechanism for adaptive audio cinema content across an entire audio playback ecosystem. As shown in diagram 420, original cinema and TV content is captured 422 and authored 423 for playback in a variety of different environments to provide a cinema 427 experience or consumer environment 434 experiences. per user (UGC) or consumer content is captured 423 and authored 425 for playback in the listening environment 434. Movie content for playback in the theater environment 427 is processed through known movie processes 426. However, in system 420, the output of the 423 Cinema Authoring Toolbox also consists of audio objects, audio channels, and metadata that convey the artistic intent of the sound mixer. This can be thought of as a mezzanine style audio package that can be used to create multiple versions of movie content for playback. In one modality, this functionality is provided by a 430 cinema-to-consumer adaptive audio translator. This translator takes an input for the adaptive audio content and distills from it the appropriate audio content and metadata for the consumer endpoints desired 434. The translator creates separate and possibly different audio and metadata streams depending on the delivery mechanism and endpoint.
[0052] As shown in the system 420 example, the cinema-to-consumer translator 430 feeds sound to image audio bitstream creation modules (broadcast, disk, OTT, etc.) and game 428. These two modules, which are suitable for delivering cinema content, can be fed into multiple distribution pipelines 432, all of which can deliver end points to the consumer. For example, adaptive audio cinema content can be encoded using a codec suitable for broadcast purposes, such as Dolby Digital Plus, which can be modified to carry channels, objects and associated metadata, and is transmitted through the camera. broadcast via cable or satellite and then decoded and rendered in a home for home theater or television playback. Similarly, the same content could be encoded using a codec suitable for online distribution where bandwidth is limited, where it is then streamed over a 3G or 4G mobile network and then decoded and rendered for playback through of a mobile device with the use of headphones. Other content sources such as TV, live broadcast, games, and music can also use the adaptive audio format to create and deliver content for a next-generation audio format.
[0053] The system in Figure 4B provides an enhanced user experience across the entire consumer audio ecosystem, which can include home theater (A/V receiver, soundbar, and BluRay), E-media (PC, Tablet, Mobile phone, including headphone playback), broadcast (TV and set-top box), music, games, live sound, user-generated content ("UGC"), and so on. Such a system provides: enhanced audience immersion for all endpoint devices, expanded artistic control for audio content creators, enhanced content-dependent metadata (descriptives) for enhanced rendering, expanded flexibility and scalability for playback, preservation, and letterhead matching, and the opportunity for dynamic rendering of content based on user position and interaction. The system includes several components, including new mixer tools for content creators, new and updated packaging and encoding tools for dynamic home distribution and playback, rendering and mixing (suitable for different configurations), locations and additional speaker designs .
[0054] The adaptive audio ecosystem is configured to be a complete, end-to-end, next-generation audio system with the use of the adaptive audio format that includes content creation, packaging, distribution and playback. tion/rendering across a wide range of endpoint devices and use cases. As shown in Figure 4B, the system originates with content captured to and from a number of different use cases, 422 and 424. These capture points include all relevant content formats, including film, TV, live broadcast (and sound ), UGC, games and music. Content, as it passes through the ecosystem, passes through several key phases such as pre-processing and authoring tools, translation tools (ie, adaptive audio content translation for consumer cinema content distribution applications) , specific adaptive audio packaging/bitstream encoding (which captures audio essence data as well as audio playback information and additional metadata), distribution encoding using new or existing codecs (eg DD+, TrueHD, Dolby Pulse) for effective distribution across multiple audio channels, broadcast through the relevant distribution channels (broadcast, disk, mobile phone, Internet, etc.) and finally endpoint-aware dynamic rendering to reproduce and transport the former. adaptive audio user experience defined by the content creator that provides the benefits of the spatial audio experience. The adaptive audio system can be used when rendering for a widely varied number of consumer endpoints, and the set of rendering procedures that are applied can be improved depending on the endpoint device. For example, home theater systems and soundbars can have 2, 3, 5, 7 or even 9 separate speakers in multiple locations. Many other types of systems have only two speakers (TV, laptop, music stand) and almost all commonly used devices have an emitted headset (PC, laptop, tablet, cell phone, music player, and so on. against).
[0055] Current authoring and distribution systems for ambient sound audio create and deliver audio that is intended for playback to fixed and predefined speaker locations with limited knowledge of the type of content conveyed in the essence of audio (ie, audio which is run back by the playback system). The adaptive audio system, however, provides a new hybrid approach to audio creation that includes the option for both fixed speaker location-specific audio (left channel, right channel, etc.) and audio-based elements. object that generalized 3D spatial information, including position, size and velocity. This hybrid approach provides a balanced approach to fidelity (provided by fixed speaker locations) and flexibility in rendering (generalized audio objects). This system also provides additional useful information about the audio content through new metadata that is paired with the essence of audio by the content creator at the time of content creation/authoring. This information provides detailed information about the audio attributes that can be used during rendering. Such attributes can include content type (dialogue, music, effect, Foley, background/ambience, etc.) as well as audio object information such as spatial attributes (3D position, object size, velocity, etc.) and useful rendering information (socket for speaker location, channel weights, gain, bass management information, etc.). Audio content and playback intent metadata can be manually created by the content creator or created through the use of automatic media intelligence algorithms that can run in the background during the authoring process and be reviewed by the creator content during a final QA phase if desired.
[0056] Figure 4C is a block diagram of the functional components of an adaptive audio environment under a modality. As shown in diagram 450, the system processes an encoded bitstream 452 that carries both a hybrid object and channel-based audio stream. The bit stream is processed by rendering/signal processing block 454. In one embodiment, at least portions of this function block can be implemented in render block 312 illustrated in Figure 3. Render function 454 implements various rendering algorithms for adaptive audio, as well as certain post-processing algorithms such as upmixing, processing direct versus reflected sound, and the like. Renderer output is provided to speakers 458 through bidirectional interconnects 456. In one embodiment, speakers 458 comprise a number of individual drivers that can be arranged in a surround sound, or similar configuration. Drivers are individually addressable and can be incorporated into individual enclosures or cabinets or multiple driver arrays. The 450 system can also include 460 microphones that provide measurements of listening environment or room characteristics that can be used to calibrate the rendering process. System configuration and calibration functions are provided in block 462. These functions can be included as part of the rendering components, or they can be implemented as separate components that are functionally coupled to the renderers. The 456 bidirectional interconnects provide the feedback signal path from the speakers in the listening environment back to the 462 calibration component. LISTENING ENVIRONMENTS
[0057] Adaptive audio system implementations can be employed in a variety of different listening environments. These include three primary areas of audio playback applications: home theater systems, televisions and sound bars, and headphones. Figure 5 illustrates the use of an adaptive audio system in an exemplary home theater environment. The system in Figure 5 illustrates a superset of components and functions that can be provided by an adaptive audio system, and certain aspects can be reduced or removed based on user needs, while still providing a rich experience. The 500 system includes several different speakers and drivers in a variety of different cabinets or 504 arrays. The speakers include individual drivers that provide front, side, and up firing options, as well as dynamic audio virtualization with the use of certain sets of audio processing procedures. Diagram 500 illustrates a number of speakers employed in a standard 9.1 speaker configuration. These include left and right height speakers (LH, RH), left and right speakers (L, R), a center speaker (shown as a modified center speaker), and rear and left and right environment (LS, RS, LB, and RB, LFE low frequency element is not shown).
[0058] Figure 5 illustrates the use of a center channel speaker 510 used at a central location in the listening environment. In one modality, this speaker is deployed using a modified center channel or high-resolution center channel 510. Such a speaker can be a front-firing center channel arrangement with individually addressable speakers that allow for distinct panoramas of audio objects through the arrangement that corresponds to the movement of video objects on the screen. It may be incorporated as a high resolution center channel speaker (HRC), such as that described in International Application Number PCT/US2011/028783, which is incorporated herein by reference in its entirety. The HRC 510 speaker can also include side-firing speakers as shown. These could be activated and used if the HRC speaker is used not only as a center speaker, but also as a speaker with soundbar capabilities. The HRC speaker can also be incorporated above and/or on the sides of the 502 screen to provide a two-dimensional high resolution panning option for audio objects. The center speaker 510 could also include additional drivers and deploy a steerable sound beam with separately controlled sound zones.
[0059] System 500 also includes a Near Field Effect (NFE) 512 speaker that can be located directly in front of, or near in front of the listener, such as on a table in front of a seating location. With adaptive audio it is possible to bring audio objects into the room and not just attached to the perimeter of the room. Therefore, having objects traverse through three-dimensional space is an option. An example is when an object may originate at speaker L, travel through the listening environment through speaker NFE, and end up at speaker RS. Several different speakers may be suitable for use as an NFE speaker, such as a battery powered wireless speaker.
[0060] Figure 5 illustrates the use of dynamic speaker virtualization to provide an immersive user experience in the home theater environment. Dynamic speaker virtualization is enabled through dynamic control of the parameters of speaker virtualization algorithms based on object spatial information provided by the adaptive audio content. This dynamic virtualization is shown in Figure 5 for the L and R speakers where it is natural to consider it to create the perception of moving objects along the sides of the listening environment. A separate virtualizer can be used for each relevant object and the combined signal can be sent to the L and R speakers to create a multi-object virtualization effect. Dynamic virtualization effects are shown for the L and R speakers, as well as the NFE speaker, which is intended to be a stereo speaker (with two independent inputs). This speaker, along with audio object size and position information, could be used to create a near-field audio experience from a diffuse or point source. Similar virtualization effects can also be applied to any or all of the other speakers on the system. In one modality, a camera can provide additional listener position and identity information that could be used by the adaptive audio renderer to provide a more convincing experience, truer to the mixer's artistic intent.
[0061] The adaptive audio renderer understands the spatial relationship between the mixing and the playback system. In some cases of a playback environment, separate speakers may be available in all relevant areas of the listening environment, including suspended positions, as shown in Figure 1. In those cases where separate speakers are available in certain locations, the renderer can be configured to “snap” objects to the closest speakers rather than creating a ghost image between two or more speakers through panning or using speaker virtualization algorithms. Although it slightly distorts the spatial representation of the blend, it also allows the renderer to avoid unintended ghosting. For example, if the angular position of the left speaker of the mixer stand does not match the angular position of the left speaker of the playback system, enabling this function would avoid having a constant ghost image of the initial left channel.
[0062] In many cases, however, and especially in a home environment, certain speakers such as ceiling mounted pendant speakers are not available. In this case, certain sets of virtualization procedures are implemented by the renderer to play audio content suspended through existing wall or floor mounted speakers. In one embodiment, the adaptive audio system includes a modification to the default configuration by adding a front firing capability and a top (or "up" firing capability) for each speaker. In traditional home applications, speaker manufacturers have tried to introduce new driver configurations in addition to front-firing transducers and have been faced with the problem of trying to identify which of the original audio signals (or modifications to them) should be sent to these new drivers. With the adaptive audio system there is a lot of specific information regarding which audio objects should be rendered above the default horizontal plane. In one modality, pitch information present in the adaptive audio system is rendered using up trigger drivers. Likewise, side firing speakers can be used to render other specific content such as ambience effects.
[0063] An advantage of up-firing drivers is that they can be used to reflect sound from a hard ceiling surface to simulate the presence of suspended/tall speakers positioned on the ceiling. A compelling attribute of adaptive audio content is that spatially diverse audio is reproduced using an overhead speaker arrangement. As stated above, however, in many cases installing suspended speakers is too expensive or impractical in a home environment. By simulating tall speakers using speakers normally positioned in the horizontal plane, a convincing 3D experience can be created with ease by placing speakers. In this case, the adaptive audio system uses the height/shoot-up simulation drivers in a new way in which audio objects and their spatial reproduction information are used to create the audio that is reproduced by the trigger-up drivers.
[0064] Figure 6 illustrates the use of a trigger-up driver using reflected sound to simulate a single suspended speaker in a home theater. It should be noted that any number of up-firing drivers could be used in combination to create multiple simulated height speakers. Alternatively, a number of up-firing drivers can be configured to transmit sound to substantially the same point on the ceiling to achieve a certain intensity or sound effect. Diagram 600 illustrates an example in which the normal listening position 602 is located at a particular place within a listening environment. The system does not include any tall speakers for transmitting audio content that contain height indications. Instead, the 604 speaker cabinet or speaker array includes an up firing driver along with the front firing driver(s). The up trigger driver is configured (relative to location and tilt angle) to send its sound wave 606 to a particular point on ceiling 608 where it will be reflected back down to listening position 602. that the ceiling is produced from a suitable material and composition to properly reflect sound down to the listening environment. Relevant trigger-up driver characteristics (eg, size, power, location, etc.) can be selected based on ceiling composition, room size, and other relevant characteristics of the listening environment. Although only one trigger-up driver is shown in Figure 6, multiple trigger-up drivers can be incorporated into a playback system in some modalities.
[0065] In one embodiment, the adaptive audio system uses up trigger drivers to provide the height element. In general, it has been shown that incorporating signal processing to introduce perceptual height cues into the audio signal that is fed to the up trigger drivers improves the positioning and perceived quality of the virtual height signal. For example, a parametric perceptual binaural listening model was developed to create a pitch indication filter that, when used to process audio that is played back by a trigger-up driver, enhances that perceived quality of playback. In one modality, the pitch indication filter is derived from both the physical speaker location (approximately level with the listener) and the reflected speaker location (above the listener). For the physical speaker location, a directional filter is determined based on a model of the outer ear (or pinna). An inverse of this filter is then determined and used to remove the physical speaker height indications. Next, for the reflected speaker location, a second directional filter is determined, using the same model as the outer ear. This filter is applied directly, essentially reproducing the cues the ear would receive if the sound were above the listener. In practice, these filters can be combined into a mode that allows for a single filter that (1) removes the height indication from the physical speaker location, and (2) inserts the height indication from the reflected speaker location. Figure 16 is a graph illustrating the frequency response for such a combined filter. The combined filter can be used in such a way that it allows for some adaptability regarding the aggressiveness or amount of filtration that is applied. For example, in some cases, it may be beneficial not to completely remove the physical speaker height indication, or to completely apply the reflected speaker height indication as only part of the sound from the physical speaker reaches the listener directly. (with the remainder being reflected from the ceiling). SPEAKER SETUP
[0066] A major consideration of the adaptive audio system is the speaker configuration. The system uses individually addressable drivers, and an array of such drivers is configured to provide a combination of both direct and reflected sound sources. A bidirectional link to the system controller (eg A/V receiver, set-top box) allows audio and configuration data to be sent to the speaker, and speaker and sensor information to be sent back to the controller , creating an active closed loop system.
[0067] For purposes of description, the term "driver" means a single electro-acoustic transducer that produces sound in response to an electrical audio input signal. A driver can be deployed in any suitable type, geometry, and size, and can include horn, cone, ribbon, and similar transducers. The term "speaker" means one or more drivers in a unitary enclosure. Figure 7A illustrates a speaker that has a plurality of drivers in a first configuration, under one modality. As shown in Figure 7A, a 700 speaker enclosure has a number of individual drivers mounted within the enclosure. Typically, the housing will include one or more 702 front-firing drivers, such as woofers, midrange speakers, or tweeters, or any combination thereof. One or more 704 side trigger drivers may also be included. The front and side trigger drivers are typically mounted flush with the side of the enclosure so that they project sound perpendicularly outward from the vertical plane defined by the speaker, and these drivers are normally permanently fixed inside the speaker. 700 cabinet. For the adaptive audio system that features reflected sound rendering, one or more 706 up-tilt drivers are also provided. These drivers are positioned so that they project sound at an angle to the ceiling where it can then bounce back down to a listener, as shown in Figure 6. The degree of tilt can be set depending on listening environment characteristics and requirements of system. For example, the up driver 706 can be tilted up between 30 and 60 degrees and can be positioned above the front firing driver 702 in the 700 speaker housing so as to minimize interference with the sound waves produced from it. of the 702 front firing driver. The 706 up firing driver can be installed at fixed angle, or it can be installed so that the tilt angle can be adjusted manually. Alternatively, a servo mechanism can be used to allow automatic or electrical control of the tilt angle and projection direction of the up trigger driver. For certain sounds, such as surround sound, the shoot-up driver can be pointed straight up out of a top surface of the 700 speaker housing to create what could be referred to as a "top trigger" driver. In this case, a large component of the sound may reflect back down onto the speaker, depending on the acoustic characteristics of the ceiling. In most cases, however, part of the tilt angle is typically used to help project sound through reflection from the ceiling to a different or more central location within the listening environment, as shown in Figure 6.
[0068] Figure 7A is intended to illustrate an example of a speaker and driver configuration, and many other configurations are possible. For example, the trigger-up driver can be provided in its own housing to allow use with existing speakers. Figure 7B illustrates a speaker system that has drivers distributed in multiple enclosures under one modality. As shown in Figure 7B, the 712 up-firing driver is provided in a separate enclosure 710, which can then be placed next to or over an enclosure 714 that has 716 and 718 front and/or side trigger drivers. can be enclosed within a speaker soundbar, as used in many home theater environments, in which a number of small or medium-sized drivers are arranged along a geometric axis within a single horizontal or vertical enclosure. . Figure 7C illustrates the placement of drivers within a soundbar, under a modality. In this example, the 730 soundbar wrapper is a horizontal soundbar that includes 734 side firing drivers, 736 up firing drivers, and 732 front firing driver(s). Figure 7C is intended to be a configuration for example only, and any practical number of drivers for each of the functions - front, side and up shooting - can be used.
[0069] For the mode of Figures 7A to C, it should be noted that the drivers can be of any suitable shape, size and type, depending on the required frequency response characteristics, as well as any other relevant restrictions such as size, rating of power, component cost, and so on.
[0070] In a typical adaptive audio environment, a number of speaker enclosures will be contained within the listening environment. Figure 8 illustrates an exemplary placement of loudspeakers that have individually addressable drivers, including trigger-up drivers placed within a listening environment. As shown in Figure 8, the listening environment 800 includes four individual speakers 806, each of which has at least one front firing, side firing, and up firing driver. The listening environment may also contain fixed drivers used for surround sound applications such as 802 center speaker and 804 subwoofer or LFE. As can be seen in Figure 8, depending on the size of the listening environment and the respective speaker units speaker, the proper placement of 806 speakers within the listening environment can provide a rich audio environment resulting from the reflection of sounds from the ceiling from the number of trigger drivers upwards. Speakers can be intended to provide reflection from one or more points on the ceiling plane, depending on content, listening environment size, listener position, acoustic characteristics, and other relevant parameters.
[0071] Speakers used in an adaptive audio system for a home theater or similar listening environment may use a configuration that builds on existing surround sound configurations (eg 5.1, 7.1, 9.1, etc.). In this case, a number of drivers are provided and defined in accordance with the known surround sound convention, with additional drivers and settings provided for the trigger-up sound components.
[0072] Figure 9A illustrates a speaker configuration for an adaptive 5.1 audio system that uses multiple addressable drivers for reflected audio, under one modality. In the 900 configuration, a standard 5.1 speaker footprint comprising LFE 901, center speaker 902, front speakers L/R 904/906, and rear speakers L/R 908/910 is provided with eight drivers additional, giving a total of 14 addressable drivers. These eight additional drivers are denoted "up" and "side" drivers in addition to "forward" (or "front") on each 902-910 speaker unit. Direct forward drivers would be driven by subchannels that contain adaptive audio objects and any other components that are designed to have a high degree of directionality. Up-firing (reflected) drivers could contain subchannel content that is more omnidirectional or directionless, but is not so limited. Examples would include background music, or ambient sounds. If the input to the system comprises legacy ambient sound content, then that content could be intelligently factored into direct and reflected subchannels and fed to the appropriate drivers.
[0073] For direct subchannels, the loudspeaker housing could contain drivers in which the median geometric axis of the driver divides in two the "ideal point", or acoustic center of the listening environment. The up trigger drivers would be positioned so that the angle between the driver's median plane and the acoustic center is some angle in the range of 45 to 180 degrees. In the case of positioning the driver at 180 degrees, the rear-facing driver could provide sound diffusion by reflecting off a rear wall. This configuration uses the acoustic principle that, after time-aligning the trigger drivers up with the direct drivers, the early arrival signal component would be coherent, although the late arrival components would benefit from the natural diffusion provided by the listening environment .
[0074] In order to achieve the height indications provided by the adaptive audio system, the upward trigger drivers could be angled up from the horizontal plane, and at the extreme could be positioned to radiate directly upward and reflect outward from one or more reflective surfaces, such as a flat roof, or an acoustic diffuser placed immediately above the enclosure. To provide additional directionality, the center speaker could utilize a soundbar configuration (as shown in Figure 7C) with the ability to direct sound across the screen to provide a high-resolution center channel.
[0075] The 5.1 configuration of Figure 9A could be expanded by adding two additional rear enclosures similar to a standard 7.1 configuration. Figure 9B illustrates a speaker configuration for a 7.1 adaptive audio system that uses multiple addressable drivers for reflected audio in such a mode. As shown in the 920 configuration, the two additional housings 922 and 924 are placed in the 'left side environment' and 'right side environment' positions with the side speakers pointing towards the side walls in a similar manner to the front housings and the up-firing drivers set to bounce off the ceiling midway between the existing front and rear pairs. Such incremental additions can be made as often as desired, with the additional pairs filling the gaps along the side or rear walls. Figures 9A and 9B illustrate just a few examples of possible configurations of extended surround speaker configurations that can be used in conjunction with up and side firing speakers in an adaptive audio system for listening environments, and many others are also possible.
[0076] As an alternative to the #1 configurations described above, a more flexible capsule-based system could be utilized whereby each driver is contained within its own housing, which could then be mounted in any convenient location. This would use a driver configuration as shown in Figure 7B. These individual units can then be grouped in a similar way to configurations #1, or they could be scattered individually around the listening environment. Capsules are not necessarily restricted to being placed on the edges of the listening environment, they could also be placed on any surface within the listening environment (eg coffee table, bookcase, etc.). Such a system would be easy to expand, allowing the user to add more speakers over time to create a more immersive experience. If the speakers are wireless, then the capsule system could include the ability to connect speakers for charging purposes. In this design, the capsules could be connected together so that they act as a single speaker while charging, perhaps to listen to stereo music, and then disconnected and positioned around the listening environment for adaptive audio content.
[0077] In order to enhance the configurability and accuracy of the adaptive audio system with the use of up trigger addressable drivers, a number of sensors and feedback devices could be added to the enclosures to inform the renderer of characteristics that could be used in the rendering algorithm. For example, a microphone installed in each enclosure would allow the system to measure the phase, frequency and reverberation characteristics of the listening environment, along with the position of the speakers relative to one another using triangulation and HRTF-like functions of the casings themselves. Inertia sensors (eg, gyroscopes, compasses, etc.) could be used to detect the direction and angle of housings; and optical and visual sensors (eg, using a laser-based infrared rangefinder) could be used to provide positional information in relation to the listening environment itself. These represent just a few possibilities for additional sensors that could be used in the system, and others are possible as well.
[0078] Such sensor systems can be additionally enhanced by allowing the position of the drivers and/or the acoustic modifiers of the enclosures to be automatically adjustable by means of electromechanical servos. This can allow the drivers directionality to be changed at runtime to suit their placement in the listening environment in relation to walls and other drivers ("active targeting"). Similarly, any acoustic modifiers (such as baffles, horns or waveguides) could be tuned to provide the correct frequency and phase responses for optimal reproduction in any listening environment setting ("active tuning"). Both active targeting and active tuning could be performed during initial listening environment setup (for example, in conjunction with the auto-EQ/auto room setup system) or during playback in response to content being rendered. BIDIRECTIONAL INTERCONNECTION
[0079] Once configured, the speakers must be connected to the rendering system. Traditional interconnects are typically of two types: speaker-level input for passive speakers and line-level input for active speakers. As shown in Figure 4C, adaptive audio system 450 includes a bidirectional interconnect function. This interconnection is embedded within a set of physical and logical connections between the render stage 454 and the amplifier/speaker stages 458 and microphone 460. The ability to handle multiple drivers in each speaker cabinet is supported by these smart connectors between the sound source and the speaker. The bidirectional interconnector that allows the transmission of signals from the sound source (renderer) to the speaker comprises control signals and audio signals. The speaker signal for the sound source consists of both control signals and audio signals, where the audio signal in this case is audio coming from the optional built-in microphones. Power can also be supplied as part of the bidirectional interconnector, at least for the case where the speakers/drivers are not powered separately.
[0080] Figure 10 is a diagram 1000 that illustrates the composition of a bidirectional interconnection, under a modality. Sound source 1002, which can represent a renderer plus sound processor/amplifier chain, is logically and physically coupled to speaker cabinet 1004 via a pair of interconnector links 1006 and 1008. sound source 1002 to 1005 drivers inside the 1004 speaker cabinet comprises an electro-acoustic signal for each driver, one or more control signals, and optional power. Interconnector 1008 of speaker cabinet 1004 facing sound source 1002 comprises sound signals from microphone 1007 or other sensors for renderer calibration, or other similar sound processing functionality. Feedback Interconnect 1008 also contains certain driver settings and parameters that are used by the renderer to modify or process the sound signals defined for drivers over Interconnect 1006.
[0081] In one mode, each driver in each of the system cabinets is assigned an identifier (for example, a numerical designation) during system configuration. Each speaker cabinet (enclosure) can also be uniquely identified. This numerical designation is used by the speaker cabinet to determine which audio signal is sent to which driver within the cabinet. The designation is stored in the speaker cabinet in a suitable memory device. Alternatively, each driver can be configured to store its own identifier in local memory. In an additional alternative, such as one where drivers/speakers do not have local storage capability, identifiers can be stored in the render stage or other components within sound source 1002. During a louds discovery process speaker, each speaker (or a central database) is queried by the sound source for its profile. The profile defines certain driver settings, including the number of drivers in a speaker cabinet or other defined arrangement, the acoustic characteristics of each driver (eg driver type, frequency response, and so on), the x,y,z position of the center of each driver relative to the center of the front face of the speaker cabinet, the angle of each driver relative to a defined plane (eg ceiling, floor, vertical axis of cabinet, etc.), and the number of microphones and microphone characteristics. Other relevant driver and microphone/sensor parameters can also be set. In one modality, the speaker cabinet profile and driver definitions can be expressed as one or more XML documents used by the renderer.
[0082] In a possible deployment, an Internet Protocol (IP) control network is created between sound source 1002 and speaker cabinet 1004. Each speaker cabinet and sound source acts as a single network endpoint and a link-local address is given upon initialization or activation. An auto-discovery mechanism such as zero-configuration communication network (zeroconf) can be used to allow the sound source to locate every speaker on the network. Zero configuration communication network is an example of a process that automatically creates a usable IP network without manual operator intervention or special configuration servers, and other sets of similar procedures can be used. Given an intelligent network system, multiple sources can reside on the IP network like the speakers. This allows multiple sources to drive the speakers directly without routing sound through a "master" audio source (eg, traditional A/V receiver). If another source tries to address the speakers, communication is performed between all sources to determine which source is currently "active", when being active is necessarily, and whether the control can be transitioned to a new sound source. Sources can be pre-assigned a priority during manufacturing based on their classification, for example a telecommunications source may have a higher priority than an entertainment source. In a multi-room environment, such as a typical home environment, all the speakers within the overall environment may reside on a single network, but they may not need to be addressed simultaneously. During setup and auto-configuration, the sound level provided back over interconnect 1008 can be used to determine which speakers are located in the same physical space. Once this information is determined, the speakers can be grouped into groups. In that case, teaming IDs can be assigned and be part of the driver definitions. The grouping ID is sent to each speaker, and each grouping can be handled simultaneously by sound source 1002.
[0083] As shown in Figure 10, an optional power signal can be transmitted over the bidirectional interconnect. Speakers can be passive (requiring external power from the sound source) or active (requiring power from an electrical outlet). If the speaker system consists of active speakers without wireless support, the input for the speaker consists of an IEEE 802.3 compliant wired Ethernet input. If the speaker system consists of active speakers with wireless support, the input for the speaker consists of an IEEE 802.11 compliant wireless Ethernet input, or alternatively, a wireless standard specified by the WISA organization . Passive speakers can be provided by suitable power signals provided by the sound source directly. SYSTEM CONFIGURATION AND CALIBRATION
[0084] As shown in Figure 4C, the adaptive audio system functionality includes a 462 calibration function. This function is enabled by the 1007 microphone and 1008 interconnect connections shown in Figure 10. The function of the microphone component in the 1000 system is to measure the response of the individual drivers in the listening environment in order to derive an overall system response. Multiple microphone topologies can be used for this purpose, including a single microphone or an array of microphones. The simplest case is when a single omnidirectional measurement microphone positioned at the center of the listening environment is used to measure the response of each driver. If the listening and playback environment conditions warrant a more refined analysis, multiple microphones can be used instead. The most convenient location for multiple microphones is within the physical speaker cabinets of the particular speaker configuration that is used in the listening environment. Microphones installed in each housing allow the system to measure the response of each driver, in multiple positions in a listening environment. An alternative to this topology is to use multiple omnidirectional measurement microphones positioned at likely listener locations in the listening environment.
[0085] The microphone(s) are used to enable automatic configuration and calibration of the renderer and post-processing algorithms. In the adaptive audio system, the renderer is responsible for converting a hybrid object and channel-based audio stream into individual audio signals assigned to specific addressable drivers, within one or more physical speakers. The post-processing component can include: delay, equalization, gain, speaker virtualization, and upmixing. The speaker setting often represents critical information that the renderer component can use to convert a hybrid object and channel-based audio stream into individual driver audio signals to provide optimal reproduction of audio content. System configuration information includes: (1) the number of physical speakers on the system, (2) the number of individually addressable drivers on each speaker, and (3) the position and direction of each individually addressable driver, in relation to the listening environment geometry. Other features are also possible. Figure 11 illustrates the function of an automatic configuration and system calibration component, under one modality. As shown in diagram 1100, an 1102 array of one or more microphones provides acoustic information for the 1104 setup and calibration component. This acoustic information captures certain relevant characteristics of the listening environment. The 1104 setup and calibration component then provides this information to the 1106 renderer and any relevant 1108 post-processing components so that the audio signals that are ultimately sent to the speakers are tuned and enhanced for the environment. of listening.
[0086] The number of physical speakers in the system and the number of individually addressable drivers in each speaker are the physical speaker properties. These properties are passed directly from the speakers through the bidirectional interconnect 456 to the renderer 454. The renderer and speakers use a common discovery protocol, so when speakers are connected or disconnected from the system, rendering is notified of the change, and may reconfigure the system accordingly.
[0087] The geometry (size and shape) of the listening environment is a necessary item for information in the configuration and calibration process. Geometry can be determined in a number of different ways. In a manual configuration mode, the minimum binding cube width, length and height for the listening environment are entered into the system by the listener or technician through a user interface that provides input to the renderer or other processing unit within the adaptive audio system. Several sets of different UI procedures and tools can be used for this purpose. For example, listening environment geometry can be sent to the renderer by a program that automatically maps or tracks the listening environment geometry. Such a system can use a combination of computer vision, sonar, and 3D laser-based physical mapping.
[0088] The renderer uses the position of the speakers within the listening environment geometry to derive the audio signals for each individually addressable driver, including direct and reflected (shoot-up) drivers. Direct drivers are those that are directed so that the majority of their scatter pattern intersects the listening position before being diffused by one or more reflective surfaces (such as a floor, wall, or ceiling). Reflected drivers are drivers that are oriented so that most of their scatter patterns are reflected before intersecting the listening position, as illustrated in Figure 6. If a system is in a manual configuration mode, the 3D coordinates for each direct driver can be entered into the system via a UI. For reflected drivers, the 3D coordinates of the primary reflection are entered in the UI. Lasers or similar sets of procedures can be used to visualize the scatter pattern of the diffuse drivers over the surfaces of the listening environment so that the 3D coordinates can be measured and manually entered into the system.
[0089] Driver steering and positioning are typically performed using sets of manual or automatic procedures. In some cases, inertia sensors can be incorporated into each speaker. In this mode, the center speaker is designated as the "master" and its measure of measure is considered as the reference. The other speakers then transmit the dispersion patterns and time signatures to each of their individually addressable drivers. Coupled with the listening environment geometry, the difference between the center speaker reference angle and each add-on driver provides enough information for the system to automatically determine whether a driver is direct or reflected.
[0090] The speaker position setting can be fully automated if a 3D positional microphone (ie Ambisonic) is used. In this mode, the system sends a test signal to each driver and records the response. Depending on the type of microphone, signals may need to be transformed into an x, y, z representation. These signals are analyzed to find the x, y, and z components of the dominant first arrival. Coupled with the listening environment geometry, this usually provides enough information for the system to automatically set the 3D coordinates for all speaker positions, direct or reflected. Depending on the listening environment geometry, a hybrid combination of the three described methods for setting speaker coordinates can be more effective than using just one set of techniques alone.
[0091] Speaker configuration information is a required component to configure the renderer. Speaker calibration information is also needed to set up the post-processing chain: delay, equalization, and gain. Figure 12 is a flowchart illustrating the process steps of performing automatic speaker calibration using a single microphone, under one modality. In this mode, delay, equalization, and gain are automatically calculated by the system using a single omnidirectional metering microphone located in the middle of the listening position. As shown in diagram 1200, the process begins by measuring the room impulse response for each single driver alone, block 1202. The delay for each driver is then calculated by finding the cross-correlation peak deviation of the acoustic impulse response. (pictured with microphone) with directly captured electrical impulse response, block 1204. In block 1206, the calculated delay is applied to the directly captured impulse response (reference). The process then determines the gain per band and wideband values which, when applied to the measured impulse response, results in the minimum difference between it and a direct capture impulse response (reference), block 1208. This can be done by taking the windowed FFT of the impulse response of measurements and reference, calculating the magnitude-by-torque ratios between the two signals, applying a median filter on the magnitude-by-torque ratios, calculating gain values per band by averaging the gains for all binaries that lie completely within a band, calculating a wideband gain by taking the average of all the gains per band, subtracting the wideband gain from the gains per band, and applying the small room X curve (-2dB/ eighth above 2kHz). Once the gained values are determined in block 1208, the process determines the final delay values by subtracting the minimum delay from the others, so that at least one driver in the system will always have zero additional delay, block 1210.
[0092] In the case of automatic calibration using multiple microphones, the delay, equalization, and gain are automatically calculated by the system using multiple omnidirectional measurement microphones. The process is substantially identical to the set of single mic techniques, except that it is repeated for each of the mics, and the results are averaged. ALTERNATIVE APPLICATIONS
[0093] Rather than deploying an adaptive audio system across a listening or theater environment, it is possible to deploy aspects of the adaptive audio system in more localized applications, such as televisions, computers, game consoles, or similar devices. This case effectively relies on speakers that are arranged in a straight plane that matches the viewing screen or monitor surface. Figure 13 illustrates the use of an adaptive audio system in an exemplary soundbar and television use case. In general, the television use case provides challenges to creating an immersive audio experience based on the often degraded quality of equipment (TV speakers, soundbar speakers, etc.) and locations/settings(s). ) speaker, which may be limited in terms of spatial resolution (ie no room or rear speakers). The 1300 system in Figure 13 includes speakers in the left and right locations of standard televisions (TV-L and TV-R), as well as left and right up trigger drivers (TV-LH and TV-RH). The 1302 television may also include a 1304 soundbar or speakers in some sort of tall arrangement. In general, the size and quality of television speakers are reduced due to cost constraints and design choices compared to freestanding or home theater speakers. The use of dynamic virtualization, however, can help overcome these shortcomings. In Figure 13, the dynamic virtualization effect is illustrated for the TV-L and TV-R speakers so that people in a specific listening position 1308 would hear horizontal elements associated with individually suitable audio objects rendered in the horizontal plane. Additionally, height elements associated with suitable audio objects will be rendered correctly via reflected audio transmitted by the LH and RH drivers. The use of stereo virtualization on L and R television speakers is similar to L and R home theater speakers in that a potentially immersive dynamic speaker virtualization user experience can be made possible through dynamic control of parameters of loudspeaker virtualization algorithms based on object spatial information provided by adaptive audio content. This dynamic virtualization can be used to create awareness of moving objects along sides in the listening environment.
[0094] The television environment may also include an HRC speaker as shown within soundbar 1304. Such an HRC speaker may be a steerable unit that allows for panning through the HRC arrangement. There can be benefits (particularly for larger screens) having a front firing center channel arrangement with individually addressable speakers that allow distinct panoramas of audio objects through the arrangement that matches the motion of video objects on the screen. This speaker is also shown to have side-firing speakers. These could be turned on and used if the speaker is used as a soundbar so the side firing drivers provide more immersion due to lack of ambient or rear speaker. The concept of dynamic virtualization is also shown for the HRC/Soundbar speaker. Dynamic virtualization is shown for the L and R speakers on the far sides of the front-firing speaker arrangement. Again, this could be used to create awareness of moving objects along the sides of the listening environment. This modified center speaker could also include more speakers and deploy a steerable sound beam with separately controlled sound zones. Also shown in the exemplary deployment of Figure 13 is an NFE speaker 1306 located in front of the main listening site 1308. The inclusion of the NFE speaker can provide greater involvement provided by the adaptive audio system by moving sound away from the front from the listening environment and closer to the listener.
[0095] Regarding headphone rendering, the adaptive audio system maintains the original intent of the creator by matching HRTFs to the spatial position. When audio is played over headphones, binaural spatial virtualization can be achieved by applying a Head Related Transfer Function (HRTF), which processes the audio, and adds perceptual cues that create the perception of audio running in three-dimensional space. and not about standard stereo headphones. The accuracy of spatial reproduction is dependent on selecting the appropriate HRTF, which can vary based on a number of factors, including the spatial position of the audio channels or objects being rendered. Using the spatial information provided by the adaptive audio system can result in the selection of one - or a continuous varied number - of HRTFs representing 3D space to greatly enhance the playback experience.
[0096] The system also facilitates the addition of virtualization and guided three-dimensional binaural rendering. Similar to the case for spatial rendering, with the use of new and modified speaker types and locations, it is possible through the use of three-dimensional HRTFs to create cues to simulate the audio sound coming from the horizontal plane and the vertical geometric axis. Earlier audio formats that only provide channel and fixed speaker location information rendering were more limited. With adaptive audio format information, a three-dimensional, binaural rendering headset system has detailed and useful information that can be used to direct which audio elements are suitable to be rendered in both the horizontal and vertical planes. Some content may rely on the use of overhead speakers to provide a greater sense of engagement. These audio and information objects could be used for binaural rendering which is perceived to be above the listener's head when wearing headphones. Figure 14 illustrates a simplified representation of a three-dimensional binaural headset virtualization experience for use in an adaptive audio system, under one modality. As shown in Figure 14, a 1402 headphone set used to play audio from an adaptive audio system includes 1404 audio signals in the standard x, y plane, as well as the z plane, so that height associated with certain objects. audio or sounds is played back so that it sounds as if it originates above or below the sounds originating from x, y. METADATA DEFINITIONS
[0097] In one embodiment, the adaptive audio system includes components that generate metadata of the original spatial audio format. System methods and components 300 comprise an audio rendering system configured to process one or more bitstreams that contain both the conventional channel-based audio elements and the audio object encoding elements. A new extension layer containing the audio object encoding elements is defined and added to each of the channel-based audio codec bitstream or the audio object bitstream. This approach enables bitstreams, which include the extension layer to be processed by renderers for use with existing speaker and driver designs, or next-generation speakers that use individually addressable drivers and driver definitions. The spatial audio content of the spatial audio processor comprises audio objects, channels, and position metadata. When an object is rendered, it is assigned to one or more speakers according to the position metadata, and the location of the playback speakers. Additional metadata can be associated with the object to change the playback location or otherwise limit the speakers that should be used for playback. Metadata is generated at the audio workstation in response to the engineer's mixer inputs to provide render queues that control spatial parameters (eg, position, velocity, pitch, timbre, etc.) and specify which driver(s) (s) or speaker(s) in the listening environment play respective sounds during the exhibition. Metadata is associated with the respective audio data on the workstation for packaging and transport by spatial audio processor.
[0098] Figure 15 is a table that illustrates certain metadata definitions for use in an adaptive audio system for listening environments, under a modality. As shown in Table 1500, metadata definitions include: audio content type, driver definitions (number, characteristics, position, projection angle), control signals for active steering/tuning, and calibration information including room and speaker information. RESOURCES AND CAPABILITIES
[0099] As stated above, the adaptive audio ecosystem allows the content creator to integrate the spatial intent of the mix (position, size, velocity, etc.) within the bit stream through metadata. This allows for an incredible amount of flexibility in the spatial reproduction of audio. From a spatial rendering standpoint, the adaptive audio format enables the content creator to adapt the mix to the exact position of the speakers in the listening environment to avoid spatial distortion caused by the geometry of the playback system that is not. identical to the authoring system. In current audio playback systems, where only audio for one channel speaker is sent, the content creator's intent is unknown for locations in the listening environment other than fixed speaker locations. In the current channel/speaker paradigm, the only information that is known is that a specific audio channel must be sent to a specific speaker that has a predefined location in a listening environment. In the adaptive audio system, using metadata transported through the creation and distribution pipeline, the playback system can use this information to reproduce the content in a way that matches the original intent of the content creator. For example, the relationship between speakers is known for different audio objects. By providing the spatial location for an audio object, the intent of the content creator is known and this can be "mapped" onto the speaker configuration, including its location. With a dynamic rendering audio rendering system, this rendering can be updated and enhanced by adding additional speakers.
[00100] The system also enables to add guided three-dimensional spatial rendering. There have been many attempts to create a more immersive audio rendering experience through the use of new speaker designs and settings. These include the use of bipolar or dipolar speakers, side trigger, rear trigger, and up trigger drivers. With previous channel and fixed speaker location systems, determining which audio elements should be sent to these modified speakers is relatively difficult. By using an adaptive audio format, a rendering system has detailed and useful information about which audio elements (objects or otherwise) are suitable to be sent to a new speaker setup. That is, the system allows control over which audio signals are sent to the front firing drivers and which are sent to the up firing drivers. For example, adaptive audio cinema content heavily relies on the use of overhead speakers to provide a greater sense of envelopment. These audio and information objects can be sent to up trigger drivers to provide reflected audio in the listening environment to create a similar effect.
[00101] The system also allows you to adapt the mix to the exact hardware configuration of the playback system. There are many different possible speaker types and configurations in rendering equipment such as televisions, home theaters, sound bars, portable music player stands, and so on. When these systems are channel-specific audio information sent (ie, standard multi-channel audio or left and right channel) the system must process the audio to properly match the capabilities of the rendering equipment. A typical example is when standard stereo audio (left, right) is sent to a soundbar, which has more than two speakers. In current audio systems where only audio for one speaker channel is sent, the intent of the content creator is unknown and a more immersive audio experience made possible by accented equipment must be created by algorithms that make assumptions about how to modify the audio for hardware playback. An example of this is using Surround PLII, PLII-z, or Next Generation to "upmix" channel-based audio to more speakers than the original number of channel feeds. With the adaptive audio system, using metadata carried throughout the creation and distribution pipeline, a playback system can use this information to reproduce content in a way that more closely matches the original intent of the content creator. For example, some soundbars have side-firing speakers to create a sense of involvement. With adaptive audio, spatial information and content type information (ie dialog, music, ambient effects, etc.) can be used by the soundbar when controlled by a rendering system such as a TV or A receiver /V, to send only the proper audio to these side-firing speakers.
[00102] The spatial information carried by adaptive audio allows dynamic rendering of content with an awareness of the location and type of speakers present. Furthermore, information on the relationship of the listener or listeners with the audio playback equipment is now potentially available and can be used in rendering. Most game consoles include a camera accessory and intelligent image processing that can determine position and identify a person in the listening environment. This information can be used by an adaptive audio system to alter the rendering to more accurately convey the content creator's creative intent based on the position of the listener. For example, in almost all cases, rendered audio for playback assumes that the listener is located in an ideal "sweet spot" that is often equidistant from each speaker and the same position where the sound mixer was located during creation of content. However, many times people are not in this ideal position and their experience does not match the creative intent of the mixer. A typical example is when a listener is seated on the left side of the listening environment on a chair or sofa. For this case, sound that is reproduced from the speakers closest to the left will be perceived as being louder and shifting the spatial perception of the audio mix to the left. By understanding the listener's position, the system could adjust the audio rendering to lower the sound level from the left speakers and raise the level from the right speakers to rebalance the audio mix and make it noticeably correct. Delaying the audio to compensate for the listener's distance from the sweet spot is also possible. The listener position could be detected through the use of a camera or a modified remote control with some built-in signaling that would signal the listener position to the rendering system.
[00103] In addition to using standard speakers and local speakers to handle listening position, it is also possible to use beam steering technologies to create sound field "zones" that vary depending on listener position and content. Audio beamforming uses an array of speakers (typically 8 to 16 speakers horizontally spaced apart) and uses phase processing and manipulation to create an orientable sound beam. The beam forming speaker arrangement allows the creation of audio zones where the audio is primarily audible that can be used to direct specific sounds or objects with selective processing to a specific spatial location. An obvious use case is to process dialogue in a soundtrack using a dialogue-enhancement post-processing algorithm and beam that audio object directly to a user who is hearing impaired. MATRIX CODING AND SPACE UPMIX REALIZATION
[00104] In some cases, audio objects may be a desired component of adaptive audio content; however, based on bandwidth limitations, it may not be possible to send both channel/speaker audio and audio objects. In the past, matrix encoding was used to carry more audio information than is possible for a given distribution system. For example, this was the case in early cinema, when multi-channel audio was created by sound mixers, but film formats only provided stereo audio. Matrix encoding was used to intelligently downmix the multi-channel audio to two stereo channels, which were then processed with certain algorithms to recreate an approximation of the multi-channel mixing of the stereo audio. Similarly, it is possible to intelligently downmix audio objects into the base speaker channels and through the use of adaptive audio metadata and sophisticated frequency and time sensitive next-generation environment algorithms to extract the objects and render spatially correctly the same with an adaptive audio rendering system.
[00105] Additionally, when there are transmission system bandwidth limitations for audio (3G and 4G wireless applications, for example) there is also the benefit of transmitting multiple spatially diverse channels that have matrix encoded along with data objects. individual audio. A use case of such a broadcast methodology would be for broadcasting a sports broadcast with two distinct audio venues and multiple audio objects. Audio seats could represent multi-channel audio captured in two sections of different teams' bleachers and audio objects could represent different announcers that might be sympathetic to one team or the other. Using standard encoding, a 5.1 representation of each headquarters, along with two or more objects, could exceed the transmission system bandwidth constraints. In this case, if each of the 5.1 sites are matrix encoded for a stereo signal, then two sites that were originally captured as 5.1 channels could be transmitted as 2-channel headquarters 1, 2-channel headquarters 2, object 1, and object 2 as only four audio channels instead of 5.1 + 5.1 + 2 or 12.1 channels. POSITION AND DEPENDENT CONTENT PROCESSING
[00106] The adaptive audio ecosystem allows the content creator to create individual audio objects and add information about the content that can be transported to the playback system. This allows for a great deal of flexibility in processing audio before playback. Processing can be tailored to object position and type through dynamic loudspeaker virtualization control based on object position and size. Speaker virtualization refers to a method of processing audio so that a virtual speaker is perceived by a listener. This method is often used for stereo speaker playback when the source audio is multi-channel audio that includes room speaker channel feeds. Virtual speaker processing modifies the ambient speaker channel audio such that when it is played back to the stereo speakers, the ambient audio elements are virtualized to the side and back of the listener as if there was a virtual speaker located there. Currently, the location attributes of the virtual speaker location are static as the intended location of the speaker environment has been fixed. However, with adaptive audio content, the spatial locations of different audio objects are dynamic and distinct (that is, unique to each object). It is possible that post-processing, such as virtual speaker virtualization, can now be controlled in a more informed way by dynamically controlling parameters such as positional speaker angle for each object and then combining the rendered emissions from several virtualized objects to create one more immersive audio experience that more closely represents the intent of the sound mixer.
[00107] In addition to standard horizontal virtualization of audio objects, it is possible to use perceptual height cues that process fixed channel and dynamic object audio and get the perception of audio height reproduction from a standard pair of stereo speakers in place normal, horizontal plane.
[00108] Certain accentuation effects or processes can be judiciously applied to suitable types of audio content. For example, dialog accentuation can be applied to dialog objects only. Dialogue enhancement refers to a method for processing audio that contains dialogue, so that the audibility and/or intelligibility of the dialogue is increased and/or enhanced. In many cases, the audio processing that is applied to dialog is inappropriate for non-dialogue audio content (ie music, ambient effects, etc.) and can result in an objectionable audible artifact. With adaptive audio, an audio object could contain only dialog in a piece of content and could be identified accordingly, so that a rendering solution selectively applies dialog accent only to dialog content. Also, if the audio object is just dialog (and not a mix of dialog and other content, which is often the case) then dialog accent processing can process dialog exclusively (thus limiting any processing that is performed in any other content).
[00109] Similarly, audio response or EQ management can also be tailored to specific audio characteristics. For example, low management (filtering, attenuation, gain) targeted to a specific object based on its type. Bass management refers to selectively isolating and processing only the bass (or lower) frequencies in a particular piece of content. With current audio systems and delivery mechanisms this is a "blind" process, that is, applied to all audio. With adaptive audio, specific audio objects for which bass management is adequate can be identified by metadata and appropriately applied rendering processing.
[00110] Adaptive audio system also facilitates object-based dynamic range compression. Traditional audio tracks are the same duration as the content itself, although an audio object can occur for a limited amount of time in the content. Metadata associated with an object can contain level-related information about its peak and average signal amplitude, as well as its onset or attack time (particularly for transient material). This information would allow a compressor to better adapt its compression and time constants (attack, release, etc.) to better fit the content.
[00111] The system also facilitates automatic speaker-room equalization. Speaker acoustics and listening environment play a significant role in introducing audible coloration into the sound, thus impacting the timbre of the reproduced sound. Additionally, acoustics is position dependent due to reflections from the listening environment and variations in speaker directivity and due to the fact that this variation in perceived timbre will vary significantly for different listening positions. An AutoEQ (automatic room equalization) function provided in the system helps to mitigate some of these issues through automatic speaker-to-room spectral measurement and equalization, automated time delay compensation (which provides adequate imaging and possibly location detection Least Squares Based Relative Speaker) and Level Setting, Downward Redirection based on speaker headroom capability, as well as optimal pairing of the main speakers with the subwoofer(s). In a home theater or other listening environment, the adaptive audio system includes certain additional functions, such as: (1) automated target curve computation based on playback room acoustics (which is considered an open issue in research for equalization in home listening environments), (2) the influence of modal decay control with the use of time frequency analysis, (3) understand the parameters derived from measurements that govern engagement/vastness/source width/intelligibility and control these to provide the best possible listening experience, (4) directional filtering that incorporates head templates to match timbre between front and "other" speakers, and (5) detect spatial speaker positions in a distinct configuration in relation to the listener and spatial remapping (eg Wireless Summit would be an example). The incompatibility in timbre between speakers is especially revealed in certain panning content between a front anchor speaker (eg center) and room/rear/wide/high speakers.
[00112] In general, adaptive audio system also enables a convincing audio/video playback experience, particularly with larger screen sizes in a home environment, if the reproduced spatial location of some audio elements matches image elements in the home. screen. An example is having the dialogue in a movie or television show spatially match a person or character who is speaking on the screen. With normal speaker channel-based audio there is no easy method of determining where dialogue should be spatially positioned to match the location of the person or character on the screen. With the audio information available in an adaptive audio system, this kind of audio/visual alignment could easily be achieved, even on home theater systems that are featuring increasingly larger screen sizes. Audio spatial and visual positional alignment could also be used for non-character/dialogue objects such as cars, trucks, animation, and so on.
[00113] The adaptive audio ecosystem also allows for enhanced content management by allowing a content creator to create individual audio objects and add information about the content that can be transported on the playback system. This allows for a great deal of flexibility in managing audio content. From a content management standpoint, adaptive audio enables several things, such as changing the language of audio content just by replacing a dialog object to reduce content file size and/or reduce download time. Film, television and other entertainment programs are typically distributed internationally. This often requires that the language in the piece of content be changed depending on what will be played (French for movies being shown in France, German for TV shows being shown in Germany, etc.). Today this often requires that a completely independent audio soundtrack be created, packaged, and distributed for each language. With the adaptive audio system and the inherent concept of audio objects, the dialog for a piece of content could be an independent audio object. This allows the content language to be easily changed without updating or changing other elements of the audio soundtrack, such as music, effects, etc. This would not only apply to foreign languages but also inappropriate language for a particular audience, targeted advertising, etc.
[00114] Aspects of the audio environment described in this document represent the reproduction of audio or audio/visual content through suitable speakers and playback devices, and may represent any environment in which a listener is experiencing playback of the content captured, such as a movie theater, concert hall, outdoor theater, a home or room, listening booth, car, game console, headset or headset system, public address (PA) system, or any other breeding environment. Although modalities have been described primarily in relation to examples and deployments in a home theater environment in which spatial audio content is associated with television content, it should be noted that modalities can also be deployed to other systems. Spatial audio content comprising object-based audio and channel-based audio can be used in conjunction with any related content (audio, video, graphic, etc.) or can constitute self-contained audio content. The playback environment can be any suitable listening environment from headphones or field monitors close to small or large rooms, cars, outdoor arenas, concert halls, and so on.
[00115] Aspects of the systems described in this document may be deployed in a computer-based sound processing network environment suitable for processing digital or digitized audio files. Portions of the adaptive audio system may include one or more networks comprising any desired number of individual machines, including one or more routers (not shown) that serve as temporary memory and to route data transmitted between computers. Such a network can be built on several different network protocols, and can be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination of them. In an embodiment where the network comprises the Internet, one or more machines can be configured to access the Internet through web browser programs.
[00116] One or more of the components, blocks, processes or other functional components may be implemented through a computer program that controls the execution of a system processor-based computing device. It should also be noted that the various functions disclosed in this document can be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embedded in various machine-readable or computer-readable media, in terms of its behavioral, record transfer, logical, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be incorporated include, but are not limited to, physical (non-transient) media, non-volatile storage media in various forms, such as optical, magnetic, or storage media. semiconductor.
[00117] Unless the context clearly requires otherwise, throughout the description and claims, the words "comprises", "comprises", and the like shall be understood in an inclusive sense as opposed to an exclusive or exhaustive sense; which is to say, in a sense of "including but not limited to". Words using the singular or plural number also include the plural or singular number, respectively. Additionally, the words "in this application", "in this document", "above", "below", and words of similar meaning refer to this application as a whole and not to any particular portion of this application. When the word "or" is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all the items in the list, and any combination of the items in the list.
[00118] Although one or more deployments have been described for purposes of example and in terms of specific modalities, it should be understood that one or more deployments are not limited to the disclosed modalities. Rather, they are intended to cover various modifications and similar arrangements, as would be evident to those skilled in the art. Therefore, the scope of the appended claims should be given the broadest interpretation to cover all such modifications and similar provisions.

权利要求:
Claims (10)
[0001]
1. System for rendering sound using reflected sound elements comprising: an array of audio drivers for distribution around a listening environment, at least one driver of the array of audio drivers being a trigger driver upward, which is configured to project sound waves towards a ceiling of the listening environment for reflection to a listening area within the listening environment; a renderer configured to receive and process a bitstream, including audio streams, and one or more sets of metadata that are associated with each of the audio streams and that specify a playback location in the environment for listening to audio objects in a respective audio stream, with the audio streams comprising one or more reflected audio streams and one or more direct audio streams, the renderer being additionally configured to render audio objects that are to be rendered above the head of a listener in the area in the listening environment with the use of a trigger-up driver and height information related to one or more of the audio objects; and a playback component coupled to the renderer and configured to render the audio streams to a plurality of audio feeds that correspond to the array of audio drivers conforming to the one or more sets of metadata, and wherein the one or more streams reflected audio is transmitted to at least one trigger-up driver; characterized by the fact that the system performs signal processing to introduce perceptual height indications in the reflected audio streams fed to the at least one up trigger driver, the perceptual height indications derived by removing, at least partially, the audio streams. audio reflected from a first height indication to a physical location of the speaker in the listening environment and, by inserting, at least partially, into the reflected audio streams a second height indication to a location of the reflected speaker.
[0002]
2. System according to claim 1, characterized in that each audio driver of the audio driver array is uniquely addressable according to a communications protocol used by the renderer and by the playback component.
[0003]
3. System according to claim 2, characterized in that the at least one audio driver comprises one of: a side trigger driver and an upward trigger driver, and wherein the at least one audio driver is additionally incorporated into one of: a self-supporting driver within a speaker housing and a driver placed next to one or more front firing drivers within a unitary speaker housing.
[0004]
4. System according to claim 3, characterized in that the arrangement of audio drivers comprises drivers that are distributed around the listening environment in accordance with a defined ambient sound configuration.
[0005]
5. System according to claim 4, characterized in that the listening environment comprises a home environment, and the renderer and the reproduction component comprise part of a home audio system, and additionally in which the streams Audio content comprises audio content selected from the group consisting of: transformed cinema content for playback in a home environment, television content, user generated content, computer game content, and music.
[0006]
6. System, according to claim 4, characterized in that a set of metadata associated with the audio stream transmitted to the at least one driver defines one or more characteristics that belong to the reflection.
[0007]
7. System according to claim 6, characterized in that the metadata set supplements a set of base metadata that includes metadata elements associated with an object-based stream of spatial audio information, and the Metadata elements for object-based stream specify spatial parameters that control the reproduction of a corresponding object-based sound and comprise one or more of: sound position, sound width, and sound velocity.
[0008]
8. System, according to claim 7, characterized in that the set of metadata also includes metadata elements associated with a channel-based stream of the spatial audio information, and the metadata elements associated with each stream channel-based comprise ambient sound channel designations of the audio drivers in the defined surround setting.
[0009]
9. System according to claim 6, characterized in that the at least one driver is associated with a microphone placed in the listening environment, the microphone being configured to transmit configuration audio information that encapsulate characteristics of the listening environment for a calibration component coupled to the renderer, and where the audio configuration information is used by the renderer to define or modify the set of metadata associated with the audio stream transmitted to the at least one audio driver.
[0010]
10. System according to claim 1, characterized in that the at least one driver comprises one of: a manually adjustable audio transducer within a housing that is adjustable with respect to the sound firing angle relative to a plane of the listening environment and an electrically controllable audio transducer within a housing that is automatically adjustable in relation to the sound firing angle.

类似技术:

公开号 | 公开日 | 专利标题

US10743125B2|2020-08-11|Audio processing apparatus with channel remapper and object renderer

US10959033B2|2021-03-23|System for rendering and playback of object based audio in various listening environments

EP3285504B1|2020-06-17|Speaker system with an upward-firing loudspeaker

JP6186436B2|2017-08-23|Reflective and direct rendering of up-mixed content to individually specifiable drivers

ES2871224T3|2021-10-28|System and method for the generation, coding and computer interpretation | of adaptive audio signals

CN106416293B|2021-02-26|Audio speaker with upward firing driver for reflected sound rendering

US11277703B2|2022-03-15|Speaker for reflecting sound off viewing screen or display surface

同族专利:

公开号 | 公开日

CN107454511A|2017-12-08|

CN107509141A|2017-12-22|

KR20150038487A|2015-04-08|

US20180020310A1|2018-01-18|

RU2015111450A|2016-10-20|

CN104604256B|2017-09-15|

BR112015004288A2|2017-07-04|

RU2602346C2|2016-11-20|

EP2891337A1|2015-07-08|

HK1205846A1|2015-12-24|

EP2891337B8|2016-12-14|

US20150350804A1|2015-12-03|

ES2606678T3|2017-03-27|

US9794718B2|2017-10-17|

JP2015530824A|2015-10-15|

CN104604256A|2015-05-06|

EP2891337B1|2016-10-05|

KR101676634B1|2016-11-16|

CN107509141B|2019-08-27|

WO2014036085A1|2014-03-06|

US10743125B2|2020-08-11|

US20210029482A1|2021-01-28|

JP6167178B2|2017-07-19|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

DE2941692A1|1979-10-15|1981-04-30|Matteo Torino Martinez|Loudspeaker circuit with treble loudspeaker pointing at ceiling - has middle frequency and complete frequency loudspeakers radiating horizontally at different heights|

DE3201455C2|1982-01-19|1985-09-19|Dieter 7447 Aichtal Wagner|Speaker box|

US4890689A|1986-06-02|1990-01-02|Tbh Productions, Inc.|Omnidirectional speaker system|

US6577738B2|1996-07-17|2003-06-10|American Technology Corporation|Parametric virtual speaker and surround-sound system|

US6229899B1|1996-07-17|2001-05-08|American Technology Corporation|Method and device for developing a virtual speaker distant from the sound source|

JP4221792B2|1998-01-09|2009-02-12|ソニー株式会社|Speaker device and audio signal transmitting device|

US6134645A|1998-06-01|2000-10-17|International Business Machines Corporation|Instruction completion logic distributed among execution units for improving completion efficiency|

JP3382159B2|1998-08-05|2003-03-04|株式会社東芝|Information recording medium, reproducing method and recording method thereof|

JP3525855B2|2000-03-31|2004-05-10|松下電器産業株式会社|Voice recognition method and voice recognition device|

JP3747779B2|2000-12-26|2006-02-22|株式会社ケンウッド|Audio equipment|

KR100542129B1|2002-10-28|2006-01-11|한국전자통신연구원|Object-based three dimensional audio system and control method|

FR2847376B1|2002-11-19|2005-02-04|France Telecom|METHOD FOR PROCESSING SOUND DATA AND SOUND ACQUISITION DEVICE USING THE SAME|

DE10321986B4|2003-05-15|2005-07-14|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for level correcting in a wave field synthesis system|

JP4127156B2|2003-08-08|2008-07-30|ヤマハ株式会社|Audio playback device, line array speaker unit, and audio playback method|

JP4114584B2|2003-09-25|2008-07-09|ヤマハ株式会社|Directional speaker control system|

JP4114583B2|2003-09-25|2008-07-09|ヤマハ株式会社|Characteristic correction system|

JP4254502B2|2003-11-21|2009-04-15|ヤマハ株式会社|Array speaker device|

JP2005223713A|2004-02-06|2005-08-18|Sony Corp|Apparatus and method for acoustic reproduction|

US20050177256A1|2004-02-06|2005-08-11|Peter Shintani|Addressable loudspeaker|

JP2005295181A|2004-03-31|2005-10-20|Victor Co Of Japan Ltd|Voice information generating apparatus|

US8363865B1|2004-05-24|2013-01-29|Heather Bottum|Multiple channel sound system using multi-speaker arrays|

JP4127248B2|2004-06-23|2008-07-30|ヤマハ株式会社|Speaker array device and audio beam setting method for speaker array device|

JP4214961B2|2004-06-28|2009-01-28|セイコーエプソン株式会社|Superdirective sound system and projector|

JP3915804B2|2004-08-26|2007-05-16|ヤマハ株式会社|Audio playback device|

CA2598575A1|2005-02-22|2006-08-31|Verax Technologies Inc.|System and method for formatting multimode sound content and metadata|

DE102005008343A1|2005-02-23|2006-09-07|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for providing data in a multi-renderer system|

JP4682927B2|2005-08-03|2011-05-11|セイコーエプソン株式会社|Electrostatic ultrasonic transducer, ultrasonic speaker, audio signal reproduction method, ultrasonic transducer electrode manufacturing method, ultrasonic transducer manufacturing method, superdirective acoustic system, and display device|

JP4793174B2|2005-11-25|2011-10-12|セイコーエプソン株式会社|Electrostatic transducer, circuit constant setting method|

US7606377B2|2006-05-12|2009-10-20|Cirrus Logic, Inc.|Method and system for surround sound beam-forming using vertically displaced drivers|

US7676049B2|2006-05-12|2010-03-09|Cirrus Logic, Inc.|Reconfigurable audio-video surround sound receiver and method|

US8036767B2|2006-09-20|2011-10-11|Harman International Industries, Incorporated|System for extracting and changing the reverberant content of an audio input signal|

US8855275B2|2006-10-18|2014-10-07|Sony Online Entertainment Llc|System and method for regulating overlapping media messages|

JP4449998B2|2007-03-12|2010-04-14|ヤマハ株式会社|Array speaker device|

CN101809654B|2007-04-26|2013-08-07|杜比国际公司|Apparatus and method for synthesizing an output signal|

KR100902874B1|2007-06-26|2009-06-16|버츄얼빌더스 주식회사|Space sound analyser based on material style method thereof|

JP4561785B2|2007-07-03|2010-10-13|ヤマハ株式会社|Speaker array device|

EP2189009A1|2007-08-14|2010-05-26|Koninklijke Philips Electronics N.V.|An audio reproduction system comprising narrow and wide directivity loudspeakers|

GB2457508B|2008-02-18|2010-06-09|Ltd Sony Computer Entertainmen|System and method of audio adaptaton|

US8315396B2|2008-07-17|2012-11-20|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Apparatus and method for generating audio output signals using object based metadata|

EP2175670A1|2008-10-07|2010-04-14|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Binaural rendering of a multi-channel audio signal|

WO2010048157A1|2008-10-20|2010-04-29|Genaudio, Inc.|Audio spatialization and environment simulation|

US8351612B2|2008-12-02|2013-01-08|Electronics And Telecommunications Research Institute|Apparatus for generating and playing object based audio contents|

KR20100062784A|2008-12-02|2010-06-10|한국전자통신연구원|Apparatus for generating and playing object based audio contents|

GB2467534B|2009-02-04|2014-12-24|Richard Furse|Sound system|

JP2010258653A|2009-04-23|2010-11-11|Panasonic Corp|Surround system|

US8577065B2|2009-06-12|2013-11-05|Conexant Systems, Inc.|Systems and methods for creating immersion surround sound and virtual speakers effects|

ES2793958T3|2009-08-14|2020-11-17|Dts Llc|System to adaptively transmit audio objects|

JP2011066544A|2009-09-15|2011-03-31|Nippon Telegr & Teleph Corp <Ntt>|Network speaker system, transmitting apparatus, reproduction control method, and network speaker program|

CN104822036B|2010-03-23|2018-03-30|杜比实验室特许公司|The technology of audio is perceived for localization|

CN102860041A|2010-04-26|2013-01-02|剑桥机电有限公司|Loudspeakers with position tracking|

KR20120004909A|2010-07-07|2012-01-13|삼성전자주식회사|Method and apparatus for 3d sound reproducing|

RS1332U|2013-04-24|2013-08-30|Tomislav Stanojević|Total surround sound system with floor loudspeakers|US10158962B2|2012-09-24|2018-12-18|Barco Nv|Method for controlling a three-dimensional multi-layer speaker arrangement and apparatus for playing back three-dimensional sound in an audience area|

KR20140047509A|2012-10-12|2014-04-22|한국전자통신연구원|Audio coding/decoding apparatus using reverberation signal of object audio signal|

EP2830335A3|2013-07-22|2015-02-25|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus, method, and computer program for mapping first and second input channels to at least one output channel|

US9560449B2|2014-01-17|2017-01-31|Sony Corporation|Distributed wireless speaker system|

US9866986B2|2014-01-24|2018-01-09|Sony Corporation|Audio speaker system with virtual music performance|

US9426551B2|2014-01-24|2016-08-23|Sony Corporation|Distributed wireless speaker system with light show|

US9402145B2|2014-01-24|2016-07-26|Sony Corporation|Wireless speaker system with distributed lowfrequency|

US9369801B2|2014-01-24|2016-06-14|Sony Corporation|Wireless speaker system with noise cancelation|

US9232335B2|2014-03-06|2016-01-05|Sony Corporation|Networked speaker system with follow me|

EP2925024A1|2014-03-26|2015-09-30|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for audio rendering employing a geometric distance definition|

KR101856540B1|2014-04-02|2018-05-11|주식회사 윌러스표준기술연구소|Audio signal processing method and device|

US20150356212A1|2014-04-04|2015-12-10|J. Craig Oxford|Senior assisted living method and system|

US10368183B2|2014-05-19|2019-07-30|Apple Inc.|Directivity optimized sound reproduction|

JP6450780B2|2014-06-03|2019-01-09|ドルビーラボラトリーズライセンシングコーポレイション|Audio speaker with upward launch driver for reflected sound rendering|

US10229656B2|2014-06-18|2019-03-12|Sony Corporation|Image processing apparatus and image processing method to display full-size image of an object|

WO2016009863A1|2014-07-18|2016-01-21|ソニー株式会社|Server device, and server-device information processing method, and program|

EP3001701B1|2014-09-24|2018-11-14|Harman Becker Automotive Systems GmbH|Audio reproduction systems and methods|

KR101926013B1|2014-09-26|2018-12-07|애플 인크.|Audio system with configurable zones|

JP6732739B2|2014-10-01|2020-07-29|ドルビー・インターナショナル・アーベー|Audio encoders and decoders|

EP3219115A1|2014-11-11|2017-09-20|Google, Inc.|3d immersive spatial audio systems and methods|

EP3780589A1|2015-02-03|2021-02-17|Dolby Laboratories Licensing Corporation|Post-conference playback system having higher perceived quality than originally heard in the conference|

EP3254456B1|2015-02-03|2020-12-30|Dolby Laboratories Licensing Corporation|Optimized virtual scene layout for spatial meeting playback|

CN105992120B|2015-02-09|2019-12-31|杜比实验室特许公司|Upmixing of audio signals|

WO2016163833A1|2015-04-10|2016-10-13|세종대학교산학협력단|Computer-executable sound tracing method, sound tracing apparatus for performing same, and recording medium for storing same|

US10299064B2|2015-06-10|2019-05-21|Harman International Industries, Incorporated|Surround sound techniques for highly-directional speakers|

US9530426B1|2015-06-24|2016-12-27|Microsoft Technology Licensing, Llc|Filtering sounds for conferencing applications|

DE102015008000A1|2015-06-24|2016-12-29|Saalakustik.De Gmbh|Method for reproducing sound in reflection environments, in particular in listening rooms|

EP3128762A1|2015-08-03|2017-02-08|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Soundbar|

CN111147978B|2015-08-14|2021-07-13|杜比实验室特许公司|Upward firing loudspeaker with asymmetric diffusion for reflected sound reproduction|

US9930469B2|2015-09-09|2018-03-27|Gibson Innovations Belgium N.V.|System and method for enhancing virtual audio height perception|

EP3356905A4|2015-09-28|2018-09-05|RazerPte Ltd.|Computers, methods for controlling a computer, and computer-readable media|

WO2017059934A1|2015-10-08|2017-04-13|Bang & Olufsen A/S|Active room compensation in loudspeaker system|

EA202090186A3|2015-10-09|2020-12-30|Долби Интернешнл Аб|AUDIO ENCODING AND DECODING USING REPRESENTATION CONVERSION PARAMETERS|

GB2543275A|2015-10-12|2017-04-19|Nokia Technologies Oy|Distributed audio capture and mixing|

AU2015413301B2|2015-10-27|2021-04-15|Ambidio, Inc.|Apparatus and method for sound stage enhancement|

MX2015015986A|2015-10-29|2017-10-23|Lara Rios Damian|Ceiling-mounted home cinema and audio system.|

US11121620B2|2016-01-29|2021-09-14|Dolby Laboratories Licensing Corporation|Multi-channel cinema amplifier with power-sharing, messaging and multi-phase power supply|

US10778160B2|2016-01-29|2020-09-15|Dolby Laboratories Licensing Corporation|Class-D dynamic closed loop feedback amplifier|

US9693168B1|2016-02-08|2017-06-27|Sony Corporation|Ultrasonic speaker assembly for audio spatial effect|

WO2017138807A1|2016-02-09|2017-08-17|Lara Rios Damian|Video projector with ceiling-mounted home cinema audio system|

US9826332B2|2016-02-09|2017-11-21|Sony Corporation|Centralized wireless speaker system|

US9591427B1|2016-02-20|2017-03-07|Philip Scott Lyren|Capturing audio impulse responses of a person with a smartphone|

US9826330B2|2016-03-14|2017-11-21|Sony Corporation|Gimbal-mounted linear ultrasonic speaker assembly|

US9693169B1|2016-03-16|2017-06-27|Sony Corporation|Ultrasonic speaker assembly with ultrasonic room mapping|

CN108886648B|2016-03-24|2020-11-03|杜比实验室特许公司|Near-field rendering of immersive audio content in portable computers and devices|

US10325610B2|2016-03-30|2019-06-18|Microsoft Technology Licensing, Llc|Adaptive audio rendering|

US10785560B2|2016-05-09|2020-09-22|Samsung Electronics Co., Ltd.|Waveguide for a height channel in a speaker|

JP2017212548A|2016-05-24|2017-11-30|日本放送協会|Audio signal processing device, audio signal processing method and program|

EP3465678B1|2016-06-01|2020-04-01|Dolby International AB|A method converting multichannel audio content into object-based audio content and a method for processing audio content having a spatial position|

CN105933630A|2016-06-03|2016-09-07|深圳创维－Rgb电子有限公司|Television|

EP3472832A4|2016-06-17|2020-03-11|DTS, Inc.|Distance panning using near / far-field rendering|

US9794724B1|2016-07-20|2017-10-17|Sony Corporation|Ultrasonic speaker assembly using variable carrier frequency to establish third dimension sound locating|

US10779106B2|2016-07-20|2020-09-15|Dolby Laboratories Licensing Corporation|Audio object clustering based on renderer-aware perceptual difference|

US10262665B2|2016-08-30|2019-04-16|Gaudio Lab, Inc.|Method and apparatus for processing audio signals using ambisonic signals|

CN106448687B|2016-09-19|2019-10-18|中科超影（北京）传媒科技有限公司|Audio production and decoded method and apparatus|

KR20180033771A|2016-09-26|2018-04-04|엘지전자 주식회사|Image display apparatus|

US10405125B2|2016-09-30|2019-09-03|Apple Inc.|Spatial audio rendering for beamforming loudspeaker array|

DE102016118950A1|2016-10-06|2018-04-12|Visteon Global Technologies, Inc.|Method and device for adaptive audio reproduction in a vehicle|

US9854362B1|2016-10-20|2017-12-26|Sony Corporation|Networked speaker system with LED-based wireless communication and object detection|

US9924286B1|2016-10-20|2018-03-20|Sony Corporation|Networked speaker system with LED-based wireless communication and personal identifier|

US10075791B2|2016-10-20|2018-09-11|Sony Corporation|Networked speaker system with LED-based wireless communication and room mapping|

US10623857B2|2016-11-23|2020-04-14|Harman Becker Automotive Systems Gmbh|Individual delay compensation for personal sound zones|

WO2018112335A1|2016-12-16|2018-06-21|Dolby Laboratories Licensing Corporation|Audio speaker with full-range upward firing driver for reflected sound projection|

US10798442B2|2017-02-15|2020-10-06|The Directv Group, Inc.|Coordination of connected home devices to provide immersive entertainment experiences|

US10149088B2|2017-02-21|2018-12-04|Sony Corporation|Speaker position identification with respect to a user based on timing information for enhanced sound adjustment|

US9820073B1|2017-05-10|2017-11-14|Tls Corp.|Extracting a common signal from multiple audio signals|

US20180357038A1|2017-06-09|2018-12-13|Qualcomm Incorporated|Audio metadata modification at rendering device|

US10674303B2|2017-09-29|2020-06-02|Apple Inc.|System and method for maintaining accuracy of voice recognition|

GB2569214B|2017-10-13|2021-11-24|Dolby Laboratories Licensing Corp|Systems and methods for providing an immersive listening experience in a limited area using a rear sound bar|

US10531222B2|2017-10-18|2020-01-07|Dolby Laboratories Licensing Corporation|Active acoustics control for near- and far-field sounds|

US10499153B1|2017-11-29|2019-12-03|Boomcloud 360, Inc.|Enhanced virtual stereo reproduction for unmatched transaural loudspeaker systems|

WO2019149337A1|2018-01-30|2019-08-08|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatuses for converting an object position of an audio object, audio stream provider, audio content production system, audio playback apparatus, methods and computer programs|

US11004438B2|2018-04-24|2021-05-11|Vizio, Inc.|Upfiring speaker system with redirecting baffle|

CN112673651A|2018-07-13|2021-04-16|诺基亚技术有限公司|Multi-view multi-user audio user experience|

US10796704B2|2018-08-17|2020-10-06|Dts, Inc.|Spatial audio signal decoder|

WO2020037282A1|2018-08-17|2020-02-20|Dts, Inc.|Spatial audio signal encoder|

EP3618464A1|2018-08-30|2020-03-04|Nokia Technologies Oy|Reproduction of parametric spatial audio using a soundbar|

US10623859B1|2018-10-23|2020-04-14|Sony Corporation|Networked speaker system with combined power over Ethernet and audio delivery|

US10575094B1|2018-12-13|2020-02-25|Dts, Inc.|Combination of immersive and binaural sound|

KR102019179B1|2018-12-19|2019-09-09|세종대학교산학협력단|Sound tracing apparatus and method|

US11095976B2|2019-01-08|2021-08-17|Vizio, Inc.|Sound system with automatically adjustable relative driver orientation|

EP3949438A1|2019-04-02|2022-02-09|Syng, Inc.|Systems and methods for spatial audio rendering|

EP3963906A1|2019-05-03|2022-03-09|Dolby Laboratories Licensing Corporation|Rendering audio objects with multiple types of renderers|

US10743105B1|2019-05-31|2020-08-11|Microsoft Technology Licensing, Llc|Sending audio to various channels using application location information|

WO2020256745A1|2019-06-21|2020-12-24|Hewlett-Packard Development Company, L.P.|Image-based soundfield rendering|

KR20210098197A|2020-01-31|2021-08-10|한림대학교 산학협력단|Liquid attributes classifier using soundwaves based on machine learning and mobile phone|

CN111641898B|2020-06-08|2021-12-03|京东方科技集团股份有限公司|Sound production device, display device, sound production control method and device|

法律状态:
2018-11-21| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-12-31| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-03-09| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-05-04| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 28/08/2013, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US201261695893P| true| 2012-08-31|2012-08-31|

US61/695,893|2012-08-31|

PCT/US2013/056989|WO2014036085A1|2012-08-31|2013-08-28|Reflected sound rendering for object-based audio|

[返回顶部]